5 Sigma: Writeup For Festivaletteratura
    By Tommaso Dorigo | September 5th 2012 09:34 AM | 10 comments | Print | E-mail | Track Comments
    About Tommaso

    I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

    View Tommaso's Profile
    On Friday evening I will be talking in the wonderful Piazza Mantegna, in downtown Mantova (see picture below). It is an event organized by Festivaletteratura (literature festival), where I will be armed with blackboard and chalks, plus a mike, and where I will explain the way a discovery of a new particle comes about.

    On Saturday instead I will be at the aula magna of the Mantova University, where in company with Gian Francesco Giudice (a CERN theorist) I will discuss the Higgs boson discovery and the aftermath. That is a more "serious" event and we will be discussing in front of a paying audience. I hear that the event is already sold out, so it will (should) be interesting!

    Friday's event is called "Five Sigma". I have 30' to explain to laypeople what it is that 5 sigmas actually mean. I have drafted something and I offer it below, in the hope that you can tell me what should be changed and what is definitely to remove, in the interest of allowing everybody to understand the things I try to explain.

    Please bear in mind that this is a very first draft. Indeed, I need your input! Also, it is a quick-and-dirty translation from the Italian text I originally drafted, so excuse me for the occasional lapse of syntax.

    Five Sigma

    Five standard deviations: the signal of the Higgs boson had to reach this level in order for a discovery to be announced. But what does this mean ?

    [In a brief introduction, I will describe the LHC, the experimental facilities, and the physics of proton-proton collisions. That should take me five to ten minutes, oh yeah ;-) ]

    As if we were looking for the classic needle in the haystack, in order to search for a new particle we need to concentrate on a particular characteristic capable of distinguishing it from background noise. For a needle, this could be its shiny appearance, or its shape or colour. For an elementary particle, the distinctive feature is clearly its mass: if we are able to measure the mass of the particle from the products of its decay -what we can detect in our detectors-, we have a powerful means for our search: while backgrounds have mass values randomly distributed, and if histogrammed these will have a flat or flattish shape, without evident deformations, the sought particle will give an excess of "events" always at the same value.

    (show a graph with a mass distribution and a peaking structure on the blackboard).

    This in short is what has been done to discover the Higgs boson last July: identify, amidst trillions of collisions produced by the accelerator, those few thousands that "look like" what we expected to see if a Higgs boson were produced; compute, with the detected signals, the mass to which each event corresponds; and finally, put these data in a graph.

    By doing the above one could end up seeing a characteristic "hump" in some point of the graph. Why a hump and not a spike at a very well-defined value ? Because our mass measurement suffers from some imprecision: the experimental resolution is not infinite, so we expect a Gaussian-shaped hump, with some non-zero width.

    And now who says we can conclude we have found a new particle ? First of all, we need to construct a model of how the background behaves. We will find, if we have done a good job, a good model which fits successfully our data points everywhere except where the hump is, like in the graph.

    (Pass a smooth curve through the data histogram in the graph).

    This curve represents our "null hypothesis": if there is no Higgs boson in the data, the events distribute, on average, according to this shape. Note, however, that the data exhibit statistical fluctuations: if for instance in a single point of the histogram (a "bin") we expect to see 100 events according to our "null hypothesis", this is the average value of our expectation, and not necessarily will we count exactly 100 events there: we might see 95, or 112, or 103. The intensity of these statistical fluctuations depends on something we call "Poisson statistics", which in short says that in 68% of cases, if you expect 100 events you will count something between 90 and 110. What Poisson statistics tells us is that the "sigma", the standard deviation of 100 events, is equal to 10, the square root of 100.

    As a side note, 68% is a number which corresponds to the area of a Gaussian distribution taken between minus-one-sigma and plus-one-sigma: sigma in fact is the parameter of a Gauss curve which says how wide this curve is. If I widen the Gaussian, I am increasing its sigma, so that between -1 and +1, in sigma units, there is always 68% of the total area of the curve.

    (Graph showing different Gauss curves, driving home the point).

    Now, let us imagine that we count all together the events in our "hump", for simplicity. In the real analysis we do something much more complicated, a "unbinned likelihood fit", but lets ignore this detail. The signal distributes in four adjacent bins, and our "null hypothesis" predicts that we should count about 400 events, with a standard deviation equal to the square root of 400, or about 20. What can we then say if we see 400 events ? 420 ? 450 ?

    If we expect 400 events and we see exactly 400, certainly our "null hypothesis" passes the test: we cannot conclude that there might be other processes, like the presence of a particle with a mass corresponding to the interval we picked, contributing an additional amount of events to those four bins.

    If we expect 400 events and we see 420 we have to act nonchalant: it happens over 16 times in 100 that one sees an excess at least as large as that one in the data (16% is half of 100% minus 68%).  What those 20 more events are is an excess of "one sigma", one standard deviation.

    If we see 450, we are instead at the level of two-and-a-half sigma. 50 events of excess, with an uncertainty of +-20.

    (show in the graph where we are in the Gaussian tail).

    In that case we could start wondering whether those 50 excess events might really be the result of 50 Higgs events in the data, an addition to the 400 of background constituting our "null hypothesis". But, since we are talking about a probability of about 1% -the area of a Gaussian from +2.5 sigma to infinity-, we can only hope that, by collecting more data, that excess becomes stronger statistically. As it is now, it does not allow us to announce a discovery! In fact, in particle physics we are not satisfied to see a departure from expectations at the 1% level in order to claim we have seen something new. I will say more about why we are much more strict at the end of the lesson.

    Let us now imagine we multiply by four our data sample, by collecting data four times longer than we did at the beginning. Now the 400 events have become 1600 (our null hypothesis) in those four adjacent bins. What is one standard deviation in that case ? It is always the square root, which for 1600 events is about 40. Not 80! The error has doubled (from 20 to 40), rather than becoming four times larger. Increasing the statistics by a factor of four has decreased the relative error by a factor of sqrt(4), or 2. So our background expectation, according to the null hypothesis, is 1600+-40.

    Now if those 50 events of excess that we saw in the first fourth of the data had been due not to a fluctuation of backgrounds (which would be most likely to get washed away by the added data) but to a true contribution of a new particle, we would expect to see four times more of it, that is 200 events sitting on top of 1600, for a total of 1600+200=1800 events.

    If we see 1800 events and we expect 1600 from our null hypothesis, and the error on 1600  is the standard deviation, that is sqrt(1600)=40 events, we have an excess that is roughly five times larger than the sigma: 200/40= 5. [Note that the actual estimate of the standard deviation might be the square root of 1800, if we base our background prediction on the data actually observed in those bins; or be still equal to 1600 events, if this is a prediction coming from a subsidiary dataset. This is a detail and I will not stress it in the presentation].

    The probability that a statistical fluctuation of those 1600 events departs from the average by five standard deviations is about one in three millions: it is rare enough to allow us to claim a discovery!

    So, "five sigma" is the minimum requirement to announce a discovery in particle physics. This is what happened at CERN last year: a first experimental evidence of Higgs events was seen in the data collected until December 2011, but the excess was not sufficient to announce a discovery: it amounted to about three standard deviations. In July 2012, however, the added data allowed CMS and ATLAS to independently claim they had 5-sigma excesses in their datasets.

    Five sigma are a very tough requirement. If you compared it to the levels at which statistical evidence is claimed in medicine, leave alone social sciences, you would conclude that physicists are masochists. However, there is a good reason for this. We search for new particles and effects in a number of places all of the time; a two- or even three-sigma excess is bound to occur here or there which is just due to a statistical fluctuation: small probabilities add incoherently, making it virtually certain that some discrepancy will be seen somewhere ! Only the strict "five-sigma" criterion protects us from claiming false discoveries!



    the LHC now has almost double the data it had for the July 4th announcment. Are there any rumors on what happens to the bump? Is it still there or did it get smaller?


    I assume it's just a translating error:

    "The relative error has doubled (from 20 to 40), rather than becoming four times larger."

    Should be the absolute error.

    Thanks, corrected.
    A nice clear summation. Since the audience will mostly consist of liberal arts majors, my recommendation is to expand the description of the Gaussian distribution. You may include an explanation that a gaussian distribution is defined by only two terms, sigma and average and show on the chalkboard how those values affect the shape of the curve.

    Yes, that is what I intend to do when I draw Gaussian curves on the board. Thanks!
    Looks like it'll be a good talk. I did a quick sweep for jargon. In a public talk you have to choose which jargon to teach and which not to teach. That is up to you... But providing a little "background" on some of these would help the audience not to get tripped up if they miss some critical definition or just don't realize that a lot of these words are used metaphorically. Also, re-iterating the definitions now and then -- for example when you come back to the null hypothesis, remind the audience what that null hypothesis is.....

    "background noise" (note you use "noise" -- a sound metaphor, followed by the example of the shininess of a needle).
    "histogrammed" - perhaps describe in a few words how a histogram is constructed from "events" at a very basic level,and motivate the use of the term "bin".
    "signals" - Not sure this word is as confusing to most non-scientists people in Italian as it is in English.
    "construct a model of how the background behaves" -- I think you could pick a different word than model, unless you want to explain what a "model" is and why you need a "model" to calculate the number of events that would occur from what we know about previously observed particles. Wouldn't you just use equations for the known particles?
    "standard deviation" -- well you do explain this one, but you could elaborate on the use of "two-sigma", "2.5 sigma" "three-sigma" etc., when you show your graph. Also it helps to point out what a "tail" is.

    You talk about Poisson statistics and Gaussian curves in nearly the same breath, without any connection, and you have Gaussian shaped humps and (ultimately) Gaussian event counts (but I thought they were Poisson!).

    Most of these just take a few extra words to illuminate. Good luck!

    Thanks for the quite useful screening Joe! I will try to
    explain these jargon terms but it requires discipline during
    a talk.

    Amir D. Aczel
    Hi Tommaso,
    Gavin Salam of Princeton and CERN (and also CNRS) gave a talk at MIT last night, in which he presented the latest findings on the Higgs. According to a chart he showed on the screen, ATLAS has by now reached 6-sigma, while CMS is at slightly more than 5-sigma for all accumulated data. The chart showed 6-sigma as equivalent to an error probability of 10^-9 while the somewhat more than 5-sigma result showed an error probability of between 10^-6 and 10^-7. If you combine them in some kind of meta-analysis (which I haven't done) my guess is an overall probability of better than 10^-7 that the Higgs is a statistical fluke, which of course will satisfy anyone (in the "real world" outside of particle physics, a 3-sigma result is considered of overwhelming weight, as many of us know).
    Your article, by matching 68% with 1-sigma, etc., implies to me that the tests that physicists carry out for such a discovery are two-tailed tests. In an argument with Dennis Overbye of the New York Times--who writes much about physics--he claimed to me that the tests are "one tailed because they are looking for excess of events." I'd trust you more than I would trust Overbye. But would you, for the sake of completeness, answer this for me once and for all: Are the implicit statistical tests in particle research two-tailed or one-tailed, and why? This is a key question to answer for all reporting of the "sigmas" in the popular press. Thanks, Tommaso!
    P.S. Another question: Gavin said last night that the Higgs, experimentally, can have only spins of 0 or 2 because it decays (in one mode) into gamma-gamma and any particle that decays into gamma-gamma cannot have spin 1. What's the reason for that? According to SM, he added, the spin is zero. Thanks!
    Amir D. Aczel
    Dennis is right, and so am I. The test is one-tailed, and 68% is introduced in the talk above just to explain what sigma is. About significances: it is peculiar that new higgs results are shown by theorists when there is nothing new approved by the experiments. One should not add significances collecting pieces
    here and there. The two experiments will have a new
    comprehensive update for winter conferences; before that they are not updating their results every other day as new
    data comes in. Most of all, however, the game of combined significances is
    over now, it is simply not interesting any more.... The interesting part is to see each individual channel with over 5σ.
    Finally yes, no spin one - a spin one particle cant decay into
    two spin one particles, it violates conservation rules.

    Amir D. Aczel
    Thanks, QDS! That was very useful. Could I indulge you for just a little more clarification:
    2=1+1 work conservationally; 0=1+1 works in Z2 but I don't think you particle physicists do modular arithmetic. So my guess is a quantum-mechanical superposition of spin +1 and -1, am I right?
    0=(+1)+(-1) or (-1)+(+1), or is it the superposition of both of these states multiplied by 
    Thanks for explaining!

    Amir D. Aczel