Higgs Live Blogging 2 - Observation Versus Discovery: Myths And Truths

This is the second post of a series of blogs that I am writing this morning, July 4th, to describe the ongoing happenings at CERN, where at 9AM the Higgs boson Observation will be announced by the ATLAS and CMS collaborations. Please reload this page at 10 minute intervals if you want to hear the latest news, or see the previous entries.

Entry 1 - Where's the queue ?

6.30 AM Reading around in blogs (such as Peter's), I see the issue of "observation" versus "discovery". It is being argued that an "observation of a new particle means a weaker statistical significance of a "discovery-level" one. I do not concur with this interpretation of words, although I admit that many perceive them the way I mentioned. But I wish to try and clarify here something about the matter, which appears not so well understood even among insiders.

So, a significance level of three standard deviations is generally enough to call it "evidence" for a anomalous effect in the data. Three standard deviations correspond to a p-value of the "null hypothesis" (the one according to which there is nothing anomalous in the data) of 1.7 per mille, so the effect is already quite strong by itself. However, these three "sigma", as we usually call them, strongly depend on the actual effect that one is studying.

Example 1 - For instance, if our experiment is all geared up about observing tau neutrinos appearing in our underground detector, we are doing one simple experiment: counting events of a very specific kind over backgrounds. This means that we have one, and only one chance, to get a statistical fluctuation. There is, in other words, no "look-elsewhere effect" at work, and the three sigma we get if we count, say, 9 events events when we expect 1.9+-0.9 from backgrounds, are genuine. They are a strong effect, from a statistical standpoint, and the issue is then whether there are any systematic effects that have not been included in the background uncertainty.

Example 2 - Now imagine you are studying the Drell-Yan production of muon pairs associated to hadronic jets for a QCD measurement, and casually decide to compute the invariant mass of the system of two muons and the highest-energy jet in the data. You examine the distribution, and it looks funny to you - there seems to be a shoulder at 200 GeV in the otherwise smoothly falling distribution. You then decide to select the data with a cut that "cleans up" the sample: you impose that the invariant mass of the two muons is in the range 85-105 GeV, a selection which should contain basically only pure on-shell Z boson decays. The three-body mass now seems to have a slightly better-defined hump at 200 GeV. You grow excited, and decide that you should refine your search by placing an additional selection on the jets: if a central object is being produced, surely requiring the jets to be central in the detector will enhance a real new particle signal! You thus select events where the leading jet has pseudorapidity in the range |η|<2.0. The bump grows stronger, although it has moved a bit from the original place (200 GeV) to 190 GeV or so. Fine -you say- let's now compute the statistical significance of that thing. You compute the likelihood of a background-only fit to the smoothly falling 3-body distribution, and then compute the likelihood by adding a Gaussian on top of it. You constrain the width and the mass, and let the amplitude of the signal float; then take the value of -2 ln [log(s+b) - logb] and compute the probability. You get 1.7 x 10^-3. A three-sigma evidence of a new particle ?

As you see, the two examples above describe entirely different experiments. In the first case the three-standard-deviation effect is genuine from the statistical standpoint, and one would need to take it quite seriously as an indication that real tau-neutrino events are being observed in the detector, barring the possibility of unknown systematic errors. In the second case it is all messed up: the bump appears by chance in a distribution which the experimenter had no particular reason to examine, and this already indicates that one should not pay much attention to "departures" from expected smooth distributions. Further, the experimenter had no prior idea of where a bump might appear. Finally, the experimenter tuned the cuts arbitrarily in order to obtain a better peak. He will not admit it, but he has played with several possible selection cuts in order to arrive at the histogram he is finally looking at. In this case, the statistical signifcance of the peak is totally insignificant. It is still a 3-sigma effect, but one which has absolutely no value.

What the above teaches us is that the number of sigmas tell us NOTHING unless we know the details of the experiment which ended up producing the result. Or just take the superluminal neutrino claim by OPERA now: a six standard deviation effect. People wrongly tried to question the statistical validity of that assessment, while it was obvious that the problem was systematic (and I correctly, back in September, indicated that the measurement of the time delay of that cable, later found to be loose, was suspiciously too precise, and I did not believe for a second that it had a 3 nanoseconds accuracy).

In the end, therefore, 3-sigma may be a strong evidence of a new effect, or may be something to laugh about, depending on the circumstances. That is why calling a 3-sigma effect by default an "evidence" of a new physics effect is ridiculous.

And what about 5-sigma ? Well, it should not take you long to understand that the same reasoning fully applies here. Now, 5-sigma are taken as a threshold for a discovery claim in common usage by experimental particle physicists because it somehow protects one from the look-elsewhere effect and the possible effect of tuning cuts or other data massaging that experimenters sometimes unknowingly apply. This is fair, but it is still not clear enough.

Experiments in the last few years have in fact tried to quantify the "global" significance by

estimating the trials factor due to the look-elsewhere effect, coming up with a number which is smaller in terms of standard deviations. Now the question is, what do we do with that latter number ? Do we still need to ask it to be larger than five sigma ?

In my opinion, no. It all depends on the circumstances. Last fall, 6 sigma were not enough for physicists to be convinced that neutrinos were superluminal. Today, 5 sigma will be plenty to convince everybody that CMS and ATLAS have independently discovered the Higgs boson (whether it is a standard model Higgs or something different remains an open question, but let us defer that discussion now). Note that the 5-sigma that the experiments will be quoting (I do not have the exact numbers yet, but we will be discussing the real figures later this morning) are local significances. Experiments will also report global significances, but these will anyway be rather ill-defined: who decides which mass range was meaningful to search in, now that we know that the Higgs has a mass likely to be in a narrow mass range ?

So, in a nutshell: the significance of the LHC data is quite sufficient to claim the discovery of the Higgs boson, and many commenters out there are right in pointing out that no matter what CERN says, the world today will know that the Higgs boson has been discovered.

Entry 1 - Where's the queue ?

6.30 AM Reading around in blogs (such as Peter's), I see the issue of "observation" versus "discovery". It is being argued that an "observation of a new particle means a weaker statistical significance of a "discovery-level" one. I do not concur with this interpretation of words, although I admit that many perceive them the way I mentioned. But I wish to try and clarify here something about the matter, which appears not so well understood even among insiders.

So, a significance level of three standard deviations is generally enough to call it "evidence" for a anomalous effect in the data. Three standard deviations correspond to a p-value of the "null hypothesis" (the one according to which there is nothing anomalous in the data) of 1.7 per mille, so the effect is already quite strong by itself. However, these three "sigma", as we usually call them, strongly depend on the actual effect that one is studying.

Example 1 - For instance, if our experiment is all geared up about observing tau neutrinos appearing in our underground detector, we are doing one simple experiment: counting events of a very specific kind over backgrounds. This means that we have one, and only one chance, to get a statistical fluctuation. There is, in other words, no "look-elsewhere effect" at work, and the three sigma we get if we count, say, 9 events events when we expect 1.9+-0.9 from backgrounds, are genuine. They are a strong effect, from a statistical standpoint, and the issue is then whether there are any systematic effects that have not been included in the background uncertainty.

Example 2 - Now imagine you are studying the Drell-Yan production of muon pairs associated to hadronic jets for a QCD measurement, and casually decide to compute the invariant mass of the system of two muons and the highest-energy jet in the data. You examine the distribution, and it looks funny to you - there seems to be a shoulder at 200 GeV in the otherwise smoothly falling distribution. You then decide to select the data with a cut that "cleans up" the sample: you impose that the invariant mass of the two muons is in the range 85-105 GeV, a selection which should contain basically only pure on-shell Z boson decays. The three-body mass now seems to have a slightly better-defined hump at 200 GeV. You grow excited, and decide that you should refine your search by placing an additional selection on the jets: if a central object is being produced, surely requiring the jets to be central in the detector will enhance a real new particle signal! You thus select events where the leading jet has pseudorapidity in the range |η|<2.0. The bump grows stronger, although it has moved a bit from the original place (200 GeV) to 190 GeV or so. Fine -you say- let's now compute the statistical significance of that thing. You compute the likelihood of a background-only fit to the smoothly falling 3-body distribution, and then compute the likelihood by adding a Gaussian on top of it. You constrain the width and the mass, and let the amplitude of the signal float; then take the value of -2 ln [log(s+b) - logb] and compute the probability. You get 1.7 x 10^-3. A three-sigma evidence of a new particle ?

As you see, the two examples above describe entirely different experiments. In the first case the three-standard-deviation effect is genuine from the statistical standpoint, and one would need to take it quite seriously as an indication that real tau-neutrino events are being observed in the detector, barring the possibility of unknown systematic errors. In the second case it is all messed up: the bump appears by chance in a distribution which the experimenter had no particular reason to examine, and this already indicates that one should not pay much attention to "departures" from expected smooth distributions. Further, the experimenter had no prior idea of where a bump might appear. Finally, the experimenter tuned the cuts arbitrarily in order to obtain a better peak. He will not admit it, but he has played with several possible selection cuts in order to arrive at the histogram he is finally looking at. In this case, the statistical signifcance of the peak is totally insignificant. It is still a 3-sigma effect, but one which has absolutely no value.

What the above teaches us is that the number of sigmas tell us NOTHING unless we know the details of the experiment which ended up producing the result. Or just take the superluminal neutrino claim by OPERA now: a six standard deviation effect. People wrongly tried to question the statistical validity of that assessment, while it was obvious that the problem was systematic (and I correctly, back in September, indicated that the measurement of the time delay of that cable, later found to be loose, was suspiciously too precise, and I did not believe for a second that it had a 3 nanoseconds accuracy).

In the end, therefore, 3-sigma may be a strong evidence of a new effect, or may be something to laugh about, depending on the circumstances. That is why calling a 3-sigma effect by default an "evidence" of a new physics effect is ridiculous.

And what about 5-sigma ? Well, it should not take you long to understand that the same reasoning fully applies here. Now, 5-sigma are taken as a threshold for a discovery claim in common usage by experimental particle physicists because it somehow protects one from the look-elsewhere effect and the possible effect of tuning cuts or other data massaging that experimenters sometimes unknowingly apply. This is fair, but it is still not clear enough.

Experiments in the last few years have in fact tried to quantify the "global" significance by

estimating the trials factor due to the look-elsewhere effect, coming up with a number which is smaller in terms of standard deviations. Now the question is, what do we do with that latter number ? Do we still need to ask it to be larger than five sigma ?

In my opinion, no. It all depends on the circumstances. Last fall, 6 sigma were not enough for physicists to be convinced that neutrinos were superluminal. Today, 5 sigma will be plenty to convince everybody that CMS and ATLAS have independently discovered the Higgs boson (whether it is a standard model Higgs or something different remains an open question, but let us defer that discussion now). Note that the 5-sigma that the experiments will be quoting (I do not have the exact numbers yet, but we will be discussing the real figures later this morning) are local significances. Experiments will also report global significances, but these will anyway be rather ill-defined: who decides which mass range was meaningful to search in, now that we know that the Higgs has a mass likely to be in a narrow mass range ?

So, in a nutshell: the significance of the LHC data is quite sufficient to claim the discovery of the Higgs boson, and many commenters out there are right in pointing out that no matter what CERN says, the world today will know that the Higgs boson has been discovered.

## Comments