Basic Education For Particle Hunters: Significance And Rate Error
By Tommaso Dorigo | June 27th 2012 10:49 AM | 10 comments | Print | E-mail | Track Comments

I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

View Tommaso's Profile
I am endlessly amazed by observing, time and again, that even experienced colleagues fall in the simplest statistical traps. Mind you, I do not claim to be any better - sorry, let me rephrase: to have been any better in the early days of my career as an experimentalist. But then, I started to appreciate that to really understand physics results I needed to at least get familiar with a small set of notions in basic Statistics.

So I insist that my colleagues should pay more attention to a few basic concepts. In this blog I have erratically tried to educate my readers on Statistics topics; but the matter is not as exciting as real particle hunts or discoveries, so I know that I cannot expect a large audience when I get down to formulas and hard math concepts. Because Statistics, see, is indeed tough.

Nevertheless let me try today at least to explain something you should understand if you are to interpret correctly some of the results you are often exposed to, if you follow particle physics. Let us imagine that we are looking for a signal of a new particle, and we actually see one, with a large significance. I have two questions for you.

1) Say we observe an excess of event counts due to our searched signal, and with it we measure the cross section to be 10+-2 nanobarns, where the quoted uncertainty (+-2) is statistical only. What can we say about the significance of the observation on which the measurement is based ?

1A) It is equal to 5 standard deviations;
1B) It is equal or larger than 5 standard deviations;
1C) None of the above is necessarily true.

2) Say we measure the cross section to be 9+-3 nanobarns, where the uncertainty is the combination of statistical and systematic effects. What can we say about the significance of the observation ?

2A) It is equal to 3 standard deviations;
2B) It is equal or larger than 3 standard deviations;
2C) None of the above is necessarily true.

I am sure you are looking at these answers and are wondering what the hell is my point. Let us take the first question, then. For sure, 10 is "five-sigma" away from zero, since the statistical uncertainty is 2, and we are assuming Gaussian distributions for the uncertainties. The problem is that significance is a measure of the incompatibility of the observation with the null hypothesis, and the null hypothesis is that the cross section of the tentative new signal is zero: measuring 10+-2 does not tell us much about the compatibility of the data with the background-only hypothesis, because that depends on the background! Let me give you three examples.

- I expect 9500 events from background sources, with high precision; and I see 10000. This is a five-sigma effect, and indeed, upon subtracting backgrounds, I have 500+-100 events of signal (10000 has a Poisson uncertainty equal to sqrt(10000)=100), so the cross section has (at least) a 20% uncertainty.

- I expect 1 event from background sources, with high precision; and I see 26. This again allows me to get a cross section measurement with 20% statistical uncertainty (25+-5.1 events of signal, since 5.1 is the square root of 26, the Poisson uncertainty on the event count). However, the significance of observing 26 events when I expect 1 is a very, very large number - much larger than 10 standard deviations!

- I expect 100 events from background sources, with a systematic uncertainty of +-50%. I see
169 events. The excess is 69+-13 (13 is the square root of 169), but this is not a significant observation at all - the systematic uncertainty on the background tells us that the background-only hypothesis is perfectly acceptable, being just 1.4 standard deviations away from the observed counts.

What we get from these examples is that we cannot obtain information on the validity of the background-only hypothesis (which is what we are referring to if we talk about "significance": significance comes with "with respect to the null hypothesis", be it that we specify it or not) by just looking at the fitted signal cross section - especially if the uncertainty we are given is statistical only !

Now, let us examine example 2. Here we have a complication: we are calling in "systematic effects" without qualifying them better. Systematic effects may be due to the uncertainty in the background prediction as in the third case exemplified above, or in the signal acceptance, or on the luminosity of the data... You name it. Some of these will affect our estimate of significance of the background-only hypothesis, others have nothing to do with it.

For instance, if I measure a cross section of 9+-3 nanobarns, the error (+-3) may be due to my 33% systematic uncertainty in the luminosity of the data: if that is true, then the cross section comes from a measurement of a signal with very high precision, and the uncertainty has nothing whatsoever to do with the size of the signal, but rather on the derivation of the cross section from the signal observed, through the formula N = σ L (N events observed, σ cross section, L luminosity corresponding to the studied data). The significance of the signal, N, may be as large as we want, but we still have "only" a 33% cross section determination.

I hope this clarifies that if you see a report of observation of a new particle, with a cross section measured with 50% accuracy, that does not mean that the observation is on shaky ground. That is a different question you are asking !

What if you
expect 5 events from background sources and see 15
but the adjoining bins (1 above and 1 below) are empty ?
Do you spread out the 15 among the 3 bins
to get 5 in each bin which is background ?

What if you are looking for bumps in a digamma Higgs search
and
you see a histogram with some peaks but for each peak
there is a corresponding valley such that if you smoothed the histogram
(by filling in the valleys with adjoining peaks)
you would get a histogram very close to expected background ?

Is there some sort of photon "frequency-shifted" effect
(sort of like what causes dirt/gravel roads to look like washboards)
that could cause such a peak/valley circumstance ?
(quote from a 26 June comment by wl59 on Phil Gibbs's blog)

Tony

Hello Tony,
no right to concentrate on the center, up-fluking one. Especially after having
looked at the data! The confusion here is in part due to one mixing up
two different concepts, much like the examples in rhe post: testing a goodness
of fit hypothesis, or estimating a signal. If I see 0:15:0 and expect to see
5:5:5, what is the likelihood of my observation ? It does not take much
to realize that this is a very ill-defined question, to which there is no clear
answer... And we are not even discussing any signal here! For instance, I could
use a Kolmogorov test, and get a p-value in the whereabouts of 0.1%; or I could
use the integrated counts and get a p-value of 100%; or I could use a chi2 test and get a p-value
of 1/10000.
Even insisting that the observed distribution is highly
significant then, you would be facing
the fact that a 10- event signal predicted to spread
as 2:6:2 in those three bins, say, would not improve by
a whole lot the global p-value, computed by whatever metric
you fancy the most (a posteriori!). You would e.g. be looking
at 0:15:0 when you expected 7:11:7...
So the take home lesson is:
1- don't get enamoured with flukes post-data;
2- don't try to stick to partial interpretatipns of your data;
3- always have a well defined physics model as an alternative
to your background-only model if you
want to compare two alternative interpretations of your data.

In the peaks-valleys issue: again, let us stick to physics!

Cheers,
T.

Could you be, like, any more obvious?

Great post!

A small typo: '13 is the square root of 69'. This should of course be 169, not 69.

Thanks, fixed!
T.
"... fall into the simplest statistical traps." I think that most statistical traps are of two basic forms: (1) the trap of deliberately adjusting statistics to trap other people who are gullible, or (2) the trap of deluding oneself by making false assumptions.

Tommaso,

A related statsitical question: how stable is the Higgs signal against shifting all bins by half a bin length? (e.g., instead of having bins whose center is located at integer GeV values, having them at half integer GeV values.) Has this ever been checked?

Dear Erik,
I assume we are talking about winter 2012 Higgs analyses in the H->γγ mode by ATLAS or CMS; in any case the answer would not vary if we took the H->ZZ, or even other searches. All these are done with unbinned likelihood fits. The binning is only for display purpose, and it does not affect the result in any way, as it should.
Cheers,
T.
Tommaso,

one more point. How sure are people that the background does not have a local maximum at 125 GeV?

Any feature in a distribution requires an explanation. All the mass or energy distributions we study are shaped ultimately by the behaviour of parton distribution functions, which are smooth and monotonous in the range where they are important here. If we found a maximum in the bgr, it would be some other discovery...
Cheers,
T.