Experimental physics is about investigating the world in a quantitative manner, by exploiting our technology to carefully map the wealth of phenomena that make planets turn around stars, atoms stick together, and hearts to beat. All of that can be understood by creating models of the underlying physics processes. These models need to be fed with input parameters which we must measure.

Measurement is thus at the very basis of our understanding of the world. The more accurate our estimates become of fundamental physics parameters, the more knowledge we can extract from a comparison of the tuned models to the observed phenomena. As the accuracy of experimental measurements is affected by three different classes of effects, we need to understand the latter well, and possibly reduce them.

The first effect is statistical uncertainty: if you measure your body weight on a precise scale 10 times, for instance, you are likely to obtain 10 different readings, whose variability is caused by random variations of the conditions, to which you have no access. What we can do is to take the average of those ten readings, obtaining a measurement which is more robust to those effects than any of the individual readings. The larger our base of independent measurements (which statisticians would call "random, independent, identically distributed"), the more the average will be precise. The statistical uncertainty of our measurements can thus be tamed by the painstaking accumulation of more data.

There are two other classes of uncertainties affecting our measurements, though. As here I want to concentrate on the one which is traditionally listed as the second -systematic uncertainty-, I will first mention briefly the third one, to be done with it. The so-called "error of the third kind" is subtle and usually overlooked, but almost omnipresent in practice; fortunately, its size is typically small in most circumstances. The error of the third kind consists in the ill-defined nature of the quantity you want to measure.

(Note, incidentally, that it would make little sense to repeat the procedure 10000 times in order to gain on the statistical uncertainty in the average: the average of those 10000 readings does have a statistical uncertainty which is 100 times smaller than the scale division (so, 1 gram): but that precise average would not be picturing a quantity defined with such a precision.)

So, let us instead discuss systematic uncertainties. These are effects that are liable to bias your measurement even by large amounts. If, for example, the scale you are using is miscalibrated, it can report weights that are wrong by even one or several kilograms! The early analogic models of scales had a dial one could use to center the reading to 0.0 when no weight was placed on them, to recalibrate them; new scales can also be calibrated by different procedures, but we tend to ignore that bit - potentially losing precision of our measurements.

Still sticking to body weight measurements, there are a number of other systematic uncertainties you should consider. Even a good scale might be good at reporting weights in a certain range, but lose "linearity" of their response outside of it. This is a subtler effect than the general offset of a miscalibration: one may calibrate the scale at 0kg, and even provide a further calibration point at a different reading, but the response for other weights may not be perfectly in line.

In more complex instruments, such as the particle detectors recording the result of collisions between proton beams at the Large Hadron Collider, systematic uncertainties in our typical measurements come from dozens of different sources. Often, they are due to miscalibration of our instruments, too; but they may instead arise from the imprecise knowledge of some fundamental parameter on which we base our conversion from detector readings to final measurement results. Or, quite often, they come from the imprecision of the model we compare to the observed data in order to extract our estimates of the wanted physical quantity.

The matter is so complex, and the typical techniques to tame systematic uncertainties so refined, that in fact I believe the worth of a particle physicist can be measured by how well he or she is capable of estimating systematic uncertainties, or better, design experiments such that those effects are minimized.

I should now mention that today we make an extensive use of machine learning techniques in our data analysis in particle physics, to improve our measurement capabilities. Typically, we use neural networks to help us with the difficult task of distinguishing some physical process of interest we want to study and measure (the signal) from all the competing other processes that look like it (the background).

This is thus a signal to background discrimination task, which neural networks (and other algorithms) are much better than us at carrying out, as the problems are really complex: we measure dozens, if not hundreds, of different characteristics of what we call an "event" (a particle collision), and then we want to summarize all that information in deciding whether that event is signal or background.

However, while these machine-learning algorithms have become very good at the task of distinguishing signal from background (which is called "supervised classification", as the algorithms are first trained by showing them examples of signal and background, so they are "supervised", in the sense that they are provided with the information for the task they are charged with), they will not consider the potential effect of systematic uncertainties which affect the subsequent use of the classified signal to measure what we want to measure.

Of course: the algorithms have not been informed of what we are going to do with the resulting output they provide (a number which says how likely it is that a considered event is of the signal class). Because of that, the optimization of the discrimination task is not guaranteed to yield, at the end of the day, the smallest uncertainty on the measurement we carry out with the selected signal. There is a misalignment between the objective of the classification task and the objective of our measurement.

Last Monday I had the opportunity to discuss methods that address this misalingment, and try to better use machine learning tools in physics measurement by informing them of our final measurement goals. The occasion was the Summer School "Machine Learning for High-Energy Physics", which is organized by the Higher School of Economics, a University funded by YANDEX, the Russian analogue of Google. The school has become a must for any Ph.D. students interested in furthering their abilities with machine learning tools.

This year I was invited to lecture on the topic of "Reducing the impact of systematic uncertainties in physics measurement with machine learning". My lecture lasted 1.5 hours, and was a broad overview of all the techniques (and there are many!) which have been deployed for that complex task. If you are interested I can share with you the slides of my lecture, but it is better for you to directly go and download the article which I have published on Monday on the Cornell arXiv with my colleague Pablo de Castro. The article contains all that I lectured about, and it is easier to follow than unsupervised browsing the slides.

The bottomline is that the reduction of systematic uncertainties is a crucial part of any measurement, and that nowadays there are (arguably complex, but workable) techniques to strongly reduce their impact in supervised classification, thereby considerably improving the precision of our experimental measurements!