Experimental physics is about investigating the world in a quantitative manner, by exploiting our technology to carefully map the wealth of phenomena that make planets turn around stars, atoms stick together, and hearts to beat. All of that can be understood by creating models of the underlying physics processes. These models need to be fed with input parameters which we must measure.

Measurement is thus at the very basis of our understanding of the world. The more accurate our estimates become of fundamental physics parameters, the more knowledge we can extract from a comparison of the tuned models to the observed phenomena. As the accuracy of experimental measurements is affected by three different classes of effects, we need to understand the latter well, and possibly reduce them.

The first effect is statistical uncertainty: if you measure your body weight on a precise scale 10 times, for instance, you are likely to obtain 10 different readings, whose variability is caused by random variations of the conditions, to which you have no access. What we can do is to take the average of those ten readings, obtaining a measurement which is more robust to those effects than any of the individual readings. The larger our base of independent measurements (which statisticians would call "random, independent, identically distributed"), the more the average will be precise. The statistical uncertainty of our measurements can thus be tamed by the painstaking accumulation of more data.

There are two other classes of uncertainties affecting our measurements, though. As here I want to concentrate on the one which is traditionally listed as the second -systematic uncertainty-, I will first mention briefly the third one, to be done with it. The so-called "error of the third kind" is subtle and usually overlooked, but almost omnipresent in practice; fortunately, its size is typically small in most circumstances. The error of the third kind consists in the ill-defined nature of the quantity you want to measure.

In the example of your body weight, such an error comes from the fact that your body weight is not a constant of nature: it changes continuously. Even if you take off all your clothes and remove all humidity from your skin, the weight you will deposit on the scale every time you step on it is different, because every breath you take causes you to add or subtract atoms to what you call "your body"; similarly, evaporation off your skin causes a constant, small loss of weight; if you scratch your nose, you will remove small amounts of cells from your skin. So the quantity called "body weight" is really not very well defined. However, we may argue that this error amounts to at most a few grams for any short-lasting weighing session, so it can be ignored when the objective of your measurement is to determine your weight in, say, hundred-gram divisions as provided by your scale.

(Note, incidentally, that it would make little sense to repeat the procedure 10000 times in order to gain on the statistical uncertainty in the average: the average of those 10000 readings does have a statistical uncertainty which is 100 times smaller than the scale division (so, 1 gram): but that precise average would not be picturing a quantity defined with such a precision.)

So, let us instead discuss systematic uncertainties. These are effects that are liable to bias your measurement even by large amounts. If, for example, the scale you are using is miscalibrated, it can report weights that are wrong by even one or several kilograms! The early analogic models of scales had a dial one could use to center the reading to 0.0 when no weight was placed on them, to recalibrate them; new scales can also be calibrated by different procedures, but we tend to ignore that bit - potentially losing precision of our measurements.

Still sticking to body weight measurements, there are a number of other systematic uncertainties you should consider. Even a good scale might be good at reporting weights in a certain range, but lose "linearity" of their response outside of it. This is a subtler effect than the general offset of a miscalibration: one may calibrate the scale at 0kg, and even provide a further calibration point at a different reading, but the response for other weights may not be perfectly in line.

In more complex instruments, such as the particle detectors recording the result of collisions between proton beams at the Large Hadron Collider, systematic uncertainties in our typical measurements come from dozens of different sources. Often, they are due to miscalibration of our instruments, too; but they may instead arise from the imprecise knowledge of some fundamental parameter on which we base our conversion from detector readings to final measurement results. Or, quite often, they come from the imprecision of the model we compare to the observed data in order to extract our estimates of the wanted physical quantity.

The matter is so complex, and the typical techniques to tame systematic uncertainties so refined, that in fact I believe the worth of a particle physicist can be measured by how well he or she is capable of estimating systematic uncertainties, or better, design experiments such that those effects are minimized.

I should now mention that today we make an extensive use of machine learning techniques in our data analysis in particle physics, to improve our measurement capabilities. Typically, we use neural networks to help us with the difficult task of distinguishing some physical process of interest we want to study and measure (the signal) from all the competing other processes that look like it (the background).

This is thus a signal to background discrimination task, which neural networks (and other algorithms) are much better than us at carrying out, as the problems are really complex: we measure dozens, if not hundreds, of different characteristics of what we call an "event" (a particle collision), and then we want to summarize all that information in deciding whether that event is signal or background.

However, while these machine-learning algorithms have become very good at the task of distinguishing signal from background (which is called "supervised classification", as the algorithms are first trained by showing them examples of signal and background, so they are "supervised", in the sense that they are provided with the information for the task they are charged with), they will not consider the potential effect of systematic uncertainties which affect the subsequent use of the classified signal to measure what we want to measure.

Of course: the algorithms have not been informed of what we are going to do with the resulting output they provide (a number which says how likely it is that a considered event is of the signal class). Because of that, the optimization of the discrimination task is not guaranteed to yield, at the end of the day, the smallest uncertainty on the measurement we carry out with the selected signal. There is a misalignment between the objective of the classification task and the objective of our measurement.

Last Monday I had the opportunity to discuss methods that address this misalingment, and try to better use machine learning tools in physics measurement by informing them of our final measurement goals. The occasion was the Summer School "Machine Learning for High-Energy Physics", which is organized by the Higher School of Economics, a University funded by YANDEX, the Russian analogue of Google. The school has become a must for any Ph.D. students interested in furthering their abilities with machine learning tools.

This year I was invited to lecture on the topic of "Reducing the impact of systematic uncertainties in physics measurement with machine learning". My lecture lasted 1.5 hours, and was a broad overview of all the techniques (and there are many!) which have been deployed for that complex task. If you are interested I can share with you the slides of my lecture, but it is better for you to directly go and download the article which I have published on Monday on the Cornell arXiv with my colleague Pablo de Castro. The article contains all that I lectured about, and it is easier to follow than unsupervised browsing the slides.

The bottomline is that the reduction of systematic uncertainties is a crucial part of any measurement, and that nowadays there are (arguably complex, but workable) techniques to strongly reduce their impact in supervised classification, thereby considerably improving the precision of our experimental measurements!

---- Follow this link to my personal home page

---

Tommaso Dorigo is an experimental particle physicist who works for the INFN at the University of Padova, and collaborates with the CMS experiment at the CERN LHC. He coordinates the European network AMVA4NewPhysics as well as research in accelerator-based physics for INFN-Padova, and is an editor of the journal Reviews in Physics. In 2016 Dorigo published the book “Anomaly! Collider physics and the quest for new phenomena at Fermilab”. You can get a copy of the book on Amazon.