Physicists have a strict criterion for claiming discovery of a new effect or signal: since the top quark discovery (which was obtained in 1995 by the CDF and DZERO experiments, following an "evidence" claimed by CDF on the previous year) the rule of thumb is that one should claim that one has observed a new effect only if the statistical significance of the find exceeds the pre-defined -and certainly arbitrary- level of 5 standard deviations.
I have explained in painstaking detail the pros and cons of the criterion, how it arose, what it protects us from, etcetera in a series of posts last year, so the interested reader is welcome to peruse that material (start here). Here, however, I would like to go through a list of true and false signals of new physics appeared in the course of the last 20 years or so, i.e. from the time when the 5-sigma criterion became an established practice, at least in high-energy physics.
The reason for doing that is that I have had a brilliant intuition after observing a recurrent pattern in the searches and discoveries of the past twenty years. I am going to share this with you, but only after you too, dear reader, have been exposed to the same evidence. So here we go.
1) First of all, the top quark. That particle was first hinted at in 1994 in a paper by the CDF collaboration, which described an excess of events amounting to about 3-sigma significance. The excess coincided with striking kinematical properties of the events, which really smelled of top quarks, and indeed with a subset of 7 event candidates CDF could fit a mass for the top quark at 174 GeV, a value which stands up even nowadays when the uncertainty on that Standard Model parameter has shrunk below 1 GeV. On the right you can see the first mass signal obtained by CDF.
2) In 1995 both CDF and DZERO produced 5-sigma significances for the top quark in their data, based on event excesses. The particle was in the bag, and a detailed set of studies of its properties kept Tevatron physicists busy for the better part of the following decade.
3) In 1996 CDF saw an intriguing bump in events containing two b-quark jets and a energetic photon. The bump had a significance hard to estimate but probably in the 4-sigma ballpark. It was not published, awaiting more data -which never confirmed it. Above you can see the background-subtracted mass distribution, which shows the excess of 110-GeV dijet pairs.
4) In 1996 (a proficuous year for Higgs lookalikes) ALEPH saw a similar 4-sigma excess of 4-jet events in electron-positron collisions at energies of 130 and 136 GeV. The bump was studied in detail and the collaboration was serious about them, as they published a 30-page document detailing their properties. You can see the mass distribution of the event candidates on the left. The signal never reappeared in more data and in other experiments' datasets.
5) In 1998 CDF found a weird set of 13 "superjet" events in the data sample used for top quark meassurements. The events, which had a W signal plus two or three hadronic jets, one of which contained both a secondary vertex b-tag and a soft lepton, constituted a mild excess over the background prediction of 4.3 events, but had kinematical characteristics so odd that a statistical analysis found the probability of the data under the background-only hypothesis to be at the 6-sigma level of significance. On the right you can see the p-value of several kinematical distributions for the superjets (top) and for a control sample devoid of the putative signal. The observation generated a huge fight within the collaboration, and it took three years to converge on a paper. In the new data collected from 2001 on no similar effect was observed.
6) Fast-forward to 2004: The H1 collaboration at the HERA collider in Hamburg found a 6-sigma signal of a pentaquark state. The signal was really nice and prominent, as shown in the figure on the left, but was later disproven.
7) In 2009 the DZERO and CDF collaborations claimed observation of electroweak production of single top quarks in proton-antiproton collisions. The results were both based on a 5-sigma significance excess.
8) In 2011 the OPERA collaboration claimed to have measured the speed of neutrinos sent to Gran Sasso from the CERN beam, finding a value exceeding the speed of light, at 6-sigma level. The effect was later understood to be due to a loose cable providing a 60-nanosecond offset in the timing distribution of neutrino arrivals.
9) Again in 2011 claim of an evidence for a 145-GeV Higgs lookalike decaying to jet pairs was put forth by the CDF experiment, analyzing W+2jet events. The signal had a significance of 4 standard deviations, but was later proven to be due to ill-understood backgrounds.
10) In December 2011 ATLAS and CMS presented the results of their searches for the Higgs boson. They both had 3-sigma evidence for the particle, but they did not take the step to claim an observation of the particle yet, waiting for more data to analyze.
11) On July 4th 2012 the Higgs boson was announced by CMS and ATLAS, which both claimed to have observed the particle with a significance of 5 standard deviations. From then on, we have learned that the signal is indeed behaving exactly as we expect for a Standard Model Higgs.
Do you see a pattern ? I was considering the above data and several discussions on the pros and cons of the 5-sigma criterion, to try and summarize the status of things for a conference I am going to attend in Crete next week, ICNFP 2014 (I will be talking of "Extraordinary Claims: the 0.0000027% Solution" - nice title, huh?). At at some point it struck me. Of course. The five-sigma criterion has the hell of a lot of reasons to be there. Just look at the table below if you did not get it yet.
Do you see it now ? Only odd-significance claims turned out to be true, while all even-significance ones ended up in the dust bin!! I guess that does not bode very well for the 6-sigma BICEP observation of tensor modes in the cosmic microwave background...
UPDATE: I feel compelled to add here, one day after posting the above article, that of course I am joking! There is nothing magical in odd significances.
... Or maybe there is. For consider:
- 3-sigma effects are only published when there is a strong belief that they represent a real find, and as such they describe evidence for effects that are quite likely to be true. 3-sigma signals of improbable new entities or surprisingly strange effects are much harder to publish...
- 4-sigma effects are the kind of thing that triggered Arthur Rosenfeld, in his 1968 paper "Are there any far-out mesons and baryons?" (discussed in detail in my post on statistical significance here (but go to part 2 for the quote) to suggest that 5-sigma effects are observed before publishing. A 4-sigma effect is very exciting to an experimental physicist, especially if unexpected. Unfortunately, most unexpected signals turn out to be false...
- 5-sigma effects are those that get published once they reach the threshold for a discovery claim. The reason why 5-sigma are claimed is that the experimenters waited (or designed the experiment in such a way) for that level of evidence before producing a publication or an announcement. As such, these signals are often genuine, as they correspond to effects that are believed to be true to begin with. Examples are, in fact, the top discovery, single top observation, and the Higgs.
- 6-sigma effects are too good to be true. These are so unexpected that they do not get published when the significance crosses the 5-sigma mark, as nobody is looking ! The presence of so many 6-sigma effects that are later disproven, by the way, is a great example of the fact that systematic uncertainties (often at the root of the surprising find) do not follow the Gaussian distribution as much as we would like - large errors are sometimes possible, as shown by the loose cable of the OPERA experiment which caused by itself all the 60-ns delta-t shift.
The above is of course added confusion to an already confusing post... Of course the 5-sigma criterion is a valid one, but it should not be taken as a law carved on marble. We do have a brain and thus we should use it all of the time, when judging on the genuine nature of a signal!