The Subtle Art Of Bump Hunting - Part I
By Tommaso Dorigo | April 20th 2009 06:02 PM | 11 comments | Print | E-mail | Track Comments

I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

View Tommaso's Profile
If you have recently given a close enough look at the search results that the CDF and DZERO experiments have been producing at a regular pace on the Higgs boson - every six months, that is: for summer and winter conferences - and your exposure to particle physics results is not broad enough, you might have gotten a biased perception of how searches for new particles are performed nowadays.

Indeed, all recent Tevatron searches for the Higgs feature a combination of heavy weaponry: quite advanced statistical methods which include multi-variate likelihoods, neural network classifiers, boosted decision trees, matrix-element discriminants. Hundreds of man-years of work by particle physicists themselves have been spent in the refinement of those tools, as well as in the critical, uncompromising review of the performances that they could show on control samples of data, on simulated event sets, or when tested one against the other.

The situation has evolved a lot in this respect during the last twenty years or so. When I started my career in particle physics neural networks, for instance, were looked at with  a lot of suspicion, and despite all evidence that their output could improve search results significantly, the first experimenters who ventured to use them in discovery searches were little short of crucified!

The problem was partly due to the fact that computers were not as powerful as today. Training a neural network could take a month of CPU time, while now in an afternoon you are typically done for a typically sized classification analysis. In a sense, the evolution of technology has naturally caused our statistical methods to evolve synergetically with it. The increased confidence we have acquired in our simulation programs, the so-called "Monte Carlo" generators on which we base our modeling of the detailed characteristics of the signal we search, has been the other important factor in this evolution.

I will leave the story of the developments of advanced analysis methods, as well as of the ensuing controversies (and crucifixions) to another day. What I care to discuss and explain here and in the next part of this piece, providing a couple of examples, is that despite the trends that particle physics has displayed in the recent years, the issue of whether an experiment is seeing a new, unexpected particle in a given dataset remains one where very basic statistics rules. Mind you: little nasty devils hide in the details of basic statistics. We are going to exorcise a couple here.

The basic problem: finding humps in mass distributions

Rest mass is a particle's footprint: its value can be measured by determining the energy and angle of emission of all its decay products. A full reconstruction of the mass is not always feasible; but whenever it is, the signal of a particle decay will universally appear as an enhancement in the number of event counts at a particular value of its reconstructed mass. Such a signal is what physicists improperly call a resonance. What resonated were the constituents of the system before it disintegrated, and not the decay products; but physicists are not philologists, and they use metonymy whenever it simplifies their jargon.

So, just to clarify what I am talking about here, let me show you a picture that makes my heart miss a beat every time I look at it. The three bumps shown in the graph on the left are three famous resonances, the Upsilon family: these are three of the most common states that a pair of bottom quarks can bind together into.

Those three bumps stand out so overbearingly on top of the smooth, "continuum" background, that no physicist would object to the picture of Upsilon mesons as constituted by a bottom-antibottom quark pair orbiting around one another, and resonating at any of a small set of very characteristic frequencies - the ones corresponding to the Upsilon mass values-, until they find a way to give back to the environment their energy in a decay, producing a pair of muons (in the case of the events displayed) which are then detected and measured.

Seeing a Upsilon meson at a particle collider is commonplace nowadays, but it took the ingenuous minds of Leon Lederman and his collaborators to discover it, thirty-two years ago. Leon was awarded a Nobel prize for that important find, but as far as the experimental analysis technique is concerned, he did not invent anything new: bump hunting had been a favourite field of investigation of particle physicists since the early fifties, when the first hadronic resonances had started to appear in bubble chamber images.

Take the X(3872) meson as a very clear example. That mysterious particle, whose true quark composition is still the subject of debate, was discovered by Belle in 2003 using the decay chain $X \to J/\psi \pi^+ \pi^-$ to reconstruct it. A clear bump was observed in the invariant mass distribution of the four final-state bodies, and that was it: a new entry in the particle properties data base, and a new paper in the ArXiv, guaranteed to get hundreds of citations in the forthcoming years. It is interesting to note that the CDF collaboration had serendipitously seen that particle in their Run I data as far back as in 1994, but had not been bold enough to claim they had seen an evidence for it.

You can make up your own mind on whether my CDF colleagues were too conservative when they neglected to report on that result, by looking at the graph shown below. In it, you can clearly see the Psi(2S) signal (a resonance quite similar to the middle Upsilon bump shown previously, except for being formed by a lighter charm-anticharm quark pair) standing tall on the left, but you may also see a nagging two-bin fluctuation over the smooth background shape, sitting at about 3870 MeV.

Should CDF have called for the observation of a new unknown particle, based on the evidence of that bump alone?

Of course not. The two-bin fluctuation amounted to just a few tens of events, over a background of maybe sixty or seventy: it was simply not significant enough to be meaningful, despite the fact that we now know it was not a fluctuation at all, but a real resonance! To explain why the X at CDF in 1994 was not significant, I am going to discuss below some basic statistical facts which are known by heart by bump hunters.

(Before I do, let me pay justice to CDF, which has taken a very satisfactory revenge with the B-factories on the measurement of the X(3872) particle: As shown in the graph on the right, CDF now holds by far the most precise determination in the world -the fourth bar from below-  of the very important mass of the X particle. Why the X mass is so important to determine precisely is explained elsewhere; here just look at the graph on the left, which shows the most recent and precise measurements of the X mass: CDF tops them all! Need I say it is my experiment ?)

Signal significance and the "look elsewhere" effect

Imagine your experiment collects 100 events in a restricted region of a mass spectrum, where on average you expect to see 70 from known sources. Statisticians teach us that event counts follow a Poisson probability distribution function: this means that the intrinsic fluctuation of the observed counts should be measured in units equal to the square root of the counts. For 100 events the statistical error therefore amounts to plus or minus 10 events: this says that if you were to repeat data collection under the same experimental conditions, you would be 63% likely to get a number between 90 and 110.

So if you observe 100 and expected to see 70 events, you are facing a three-standard-deviation excess, more or less like the one shown in the graph above.

Physicists may get excited if a fluctuation in event counts reaches the level of "three-sigma", as they call it. That is dubbed "evidence" for some effect present in the data and not described by our predictions. If, however, the departure of observation from expectation reaches the level of "five-sigma" -five standard deviations away from the expectation-, one talks of "observation" of a new, yet unknown effect. Experience has shown that three-sigma fluctuations do happen in particle physics experiments out of sheer chance (discussing whether they do with the expected 0.3% frequency or at a higher rate might be entertaining, but let us forget the issue now), while "observation-level" excesses are credible indications of a discovery.

Now, consider the mass spectrum shown in the figure above: not only is the clustering of events at 3872 MeV insufficient to reach "observation level": the crucial thing here is that nobody had predicted that a J/Psi plus dipion resonance would sit exactly at that mass value! The problem is well-known to experimental physicists: since, in the absence of a theoretical prediction, the experimenter cannot tell beforehand where an excess might appear in the mass spectrum, the possibility to observe a random fluctuation resembling a signal gets automatically enhanced by the fact that we allow it to appear anywhere. If those 30 excess events are concentrated in a mass region spanning one hundredth of the total width of the explored spectrum, we have to account for that allowance somehow.

A back-of-the-envelope calculation of the mentioned effect is easy to perform: since small probabilities combine additively, the 3-permille probability of the 30-event excess flucutation gets multiplied by the number of places where it might have occurred. A 0.3% effect thus becomes something you should expect to see 30% of the times, and you should retrieve the champagne from the fridge and put it back in the cellar.

(I should probably explain better, to the most curious of you, what I mean when I say that small probabilities "combine additively": the exact way of saying it is that when one searches for the occurrence of at least one of several independent effects, each of them having a very small chance of happening, the total probability is well approximated by the simple sum of each independent probability. Take the example of three simultaneous bets on number zero at three roulette tables: since each has a P=1/37 chance of winning, the above recipe says that your total chance of winning at least one of the three bets is roughly 3P=3/37. The true number is instead computed as one minus the chance to win none of the bets, or $P_{exact}=1-(1-P)^3=1-1+3P-3P^2+P^3=3P-3P^2+P^3$, which in our example equals 3/37-3(1/37)^2+(1/37)^3: as you can see, the difference between exact and approximated result is small -exactly as small as the chance of winning on more than one table at the same time.)

In truth, while physicists usually thrive in order-of-magnitude estimates, when it comes to measures of statistical significance they become really picky. The exact way to estimate the chance that a bump of N signal events occurs is then provided by the study of pseudoexperiments. These are mock mass spectra, each made to contain the same number of entries as the one under test. Entries are extracted at random from a background shape equal to the one observed in the data, ignoring the possibility that a signal is present on top of it.

A search for a Gaussian signal in each of these mock spectra -which, I must stress, do not contain any signal by construction!- then allows to figure out how likely it is that one sees by chance a fake signal of at least N events in the real data: this probability is computed as the number of pseudoexperiments displaying such a signal divided by the total number of pseudoexperiments that have been scanned.

The only drawback of the above technique is that it is heavily computer-intensive: in order to measure very small probabilities one needs to generate very large numbers of pseudoexperiments. However, that is not a big issue nowadays, since computing power has become cheaper than the air we breathe. Pseudoexperiments are now such a standard technique that the most popular statistical package used by particle physicists provides the generation of these mock spectra with a single command.

The "Greedy Bump bias"

All the above is so well-known by specialists that I feel a bit ashamed to have reported it here: it is not new, nor original material, and it is not particularly accurate either. However, I needed to provide you with an introduction to the more interesting part of this piece which, unfortunately, has become already longer than it should. Rather than asking you for too much of your precious time in a single chunk, I prefer to defer the discussion of the part which is indeed original (and unpublished) by a few days. Nonetheless, above and below are respectively a unexplained graph and a cryptic but faithful summary of what we will see in the next part: it is an effect which, as far as I know, has not been investigated in the literature (but I would be glad to know otherwise, if you have a reference!). I have dubbed it the "greedy bump bias":
When fitting a small Gaussian signal with a fixed width and a variable mean on top of a large smooth background, the fit usually overestimates the size of the signal. The reason is that the statistical figure of merit of the fit benefits more by fitting a signal larger than the one present in the data, rather than by fitting fewer events. The fit does so by slightly moving the fitted mean away from the true value, finding a positive fluctuation left or right of it. The resulting bias in the number of fitted events appears to be a universal function of the ratio between the number of signal events and the square root of the number of background events sitting underneath it.

So, I hope you will come back for the rest of this post soon!

UPDATE: part II is here.

If climate scientists had the kind of statistical pickiness particle physicists do, the global warming discussions would be much different.
I will leave the story of the developments of advanced analysis methods, as well as of the ensuing controversies (and crucifixions) to another day.
Philosophy of science is not dead, and that makes me happy.
Want more no-nonsense, independent science? Buy Science Left Behind
Yes, as long as we do science there will be somebody willing to try and make sense of the process :)

As for climate scientists, my impression (but I'm just a by-stander) is that there are good ones and very bad ones. For some reason, the bad ones seem to mostly belong to the naysayer category though.

Cheers,
T.

Alas, this subject matter is way out of my depth, but the statistical points made here ring loud in my ears.  There leaps into my memory an incident 30 or more years ago at a polymer conference, where someone appeared to extract two independent values out of some neutron scattering data on the basis "if you read it this way" and "if you treat it that way", and appeared to treat both as simultaneously valid.   I had recently come upon the stories of the Mullah Nasr-ed-Din, and this one appeared to fit the bill like a glove:

The Mullah went to market, and brought back 3 pounds of meat, telling his wife to cook it for his dinner.  That afternoon, some of his wife's friends came round, and she cooked it for them and they all ate it up completely.
That evening, the Mullah came back expecting his dinner.  "Where's the meat I bought today?" he demanded.
The wife looked around, and seeing the cat, said, "The cat ate it".
The Mullah said "we'll see about that", and taking a pair of scales, weighed the cat - 3 pounds exactly.
"There, you see!" cried his wife, triumphantly.
"Funny" said the Mullah "three pounds indeed.  But if this is the cat, where is the meat?  And if this is the meat, where is the cat?"
I am still somewhat tired after an exciting dielectric conference last week (something I can more or less understand), but God willing, I will be sharing an item with you all soon.

Robert H. Olley Quondam Physics Department University of Reading England
Ah Robert, just as Victory has many fathers, new signals tend to confirm many theories. Your fellow at the conference was very smart, but I bet he did not go past peer review with those two interpretations...
Cheers,
T.
Hi T.

What do you mean when you say reconstruct an event (or a particle's mass)?

Jack

Hello Jack,

what a wonderful question. Indeed, I have been using some jargon above, and it is a good idea to explain the concept a bit better. We call an event the collision between two particles produced by our accelerators: this provides energy for the creation of new states of matter. Sometimes, a new particle is indeed created. This new particle will decay in a very short time into others, which -in some cases- can be seen in our detector. For instance, charged, long-lived ones such as pions and kaons, or muons, leave a track in the magnetic tracker of which CDF is endowed.  CDF is just one example here, but it is the experiment I have been discussing results above.

Now, those tracks, if assumed to be all what the decayed new particle has created, can be used to reconstruct the parent's mass. That is because we measured the curvature of the daughters in the magnetic field, and we thus know their energy. The mass of the decayed particle turned into energy of its daughters, so by doing some math (which involves relativistic mechanics) we can compute the mass of the parent. This is what we call reconstruct an event.

Cheers,
T.
Hi Tommaso,

Thank you for the explanation. So to reconstruct a particle is to make sure that we can get its mass from its decay products (through energy calculations) so we can say that we have seen it experimentally, right?

Now you also talked about "invariant mass", what is invariant mass? and why is it important?

Jack

Yes: we make an hypothesis that a few particles we observe are the sole decay products of a parent, and we reconstruct it. If we see enough of them, they always have the same reconstructed mass, and we can spot them as a bump in an otherwise flat distribution.

Invariant mass and mass are the same thing, only the former is the way we call the combination of decay products, the latter is an attribute of the parent.

Cheers,
T.
I've a question: Why a bump indicate a new particle ? I understand the other facts but i don't understand how an axcess of events of 5-sigma over the background indicate a new particle rather something else.

I try to explain better my question with an example.
If i have 3 particles that each of them decay in some other particles but one of each decay products, of the initial 3 particles, is a muon with the same anergy of 1 GeV; in the plot of the events whe can see a bump at 1 GeV, right? How can i distiguish a new particle from this occurrence?

Sorry if my english is not so good.

Dear Anon,

the attribute one tries to determine is mass, which is a fundamental parameter for an elementary particle. When a particle decays, mass converts in energy and mass of its decay products. If, and only if, you have measured the energy of ALL the decay products, and if you know what those decay products are, you can solve a simple relativistic equation that allows you to determine the mass of the decaying body.  It is not just the energy of the decaying bodies, or their sum. It is a more complicated function. I can explain this in more detail, but it takes space and time. Maybe I will write a didactical post about this.

In any case, what one does is select a data sample which one thinks might contain the decay of a new particle; one measures the energies of -say- pairs of particles which might be the result of the parent's decay; then one can make the hypothesis that they originated from the decay, reconstruct a tentative invariant mass, and make a graph. If one sees a bump, it might be due to the fact that in some cases those were really decays of a particle of well-defined mass into those pairs. The real decays will always have the same mass -save some resolution effects- and will stand up against the background, which will not have that feature and will thus appear flatter.

Cheers,
T.