UPDATE (4/7): I posted a link to a nice animated GIF which shows the (approximate) effect of scaling up the MC/data jet energy scale factor on the CDF new particle signal. See here.

UPDATE (4/7):
I added some considerations on the tentative CDF signal in a separate post today (4/7). You can find there a comparison with older semileptonic diboson searches at CDF and DZERO.

UPDATE (4/6): Okay, I won't wait 5PM FNAL time after all, since I see that other online resources are already discussing it -including the New York Times online, which quotes the co- CDF spokesperson, Giovanni Punzi. I will let the previous post with the same title live below, but here I am attaching something on top.


By means of introduction, let me first of all explain the process by means of which a unexpected discovery is made in high-energy physics.

It so happens that when experimental particle physicists search for something known, they bump into something they do not understand about their data. Most of the times, it is just a bug in their code, so physicists are accustomed to not grow excited in any way, but rather get a big cup of coffee and sit (possibly during nighttime) in front of the computer, painstakingly checking their code. Then they run it again, and if the unknown feature persists, they return to the code once again. Only after three or four iterations do they venture to start speculating that something might be going on in the data.

At that point, a more fun phase starts: the effect is studied systematically by using different simulations, by checking it with orthogonal samples of data which should not possess the same feature: so-called "control samples". Usually, in this phase one is able to spot a deficiency of the background simulations which one is comparing the data to.

Sometimes, however, the feature is hard to explain away with bugs or with insufficiently trustable simulations. At that point, a physicist has better start thinking in terms of what new physics model could be producing the effect he or she is seeing in the data. Another obvious thing that the physicist needs to do at this stage is to search for the same signal in other datasets which might likely be sensitive to it.

At the end of the lengthy process by means of which the physicists have tried to "kill" the signal they originally saw in the data, they will usually have grown confident enough to publish it. A statistical analysis should accompany the published signal, estimating how likely it is that it is a statistical effect. This is what is given in "sigma" units: a "3-sigma" signal is an evidence for a new effect, but might be produced by background fluctuations a few times in a thousand. A "5-sigma" effect is instead enough to grant the right of claiming the "observation" of the new effect.

Some more background on dijet resonance searches

The above process, in a nutshell, is more or less what has happened in CDF when a significant signal of jet pairs produced together with W bosons was first noticed. The authors of the analysis were looking for diboson production, a rare but well-known process occurring when protons and antiprotons create two W bosons together, or a W and a Z boson.

These diboson processes can best be studied by searching for events with two, three, or four charged leptons (electrons or muons, that is, since tau leptons are much harder to collect due to their frequent decay into hadrons), because backgrounds are very small in that case. Indeed, CDF saw the first evidence of WW production as early as in 1998, that is before LEP II started to produce them in large numbers. In Run II, both CDF and DZERO studied WW, WZ, and ZZ production in detail by using the fully leptonic decay of the two bosons. However, one can use jets as well.

If one requires one W boson to decay into an electron-neutrino or a lepton-neutrino pair, while the other W (or even a Z boson) to decay into a pair of quarks, the experimental signature is not in principle less striking than that of two leptons and missing energy: it presents a lepton, missing energy, and two hadronic jets. Alas, jets are however the most common product of high-energy proton-antiproton collisions at the Tevatron, and the WW signal, with its small production rate (only one WW pair is produced every 5 billion collisions, and one WZ every 20 billion collisions) is therefore buried in a large background of single W production events which feature two hadronic jets emitted by QCD radiation. This makes the search of "semileptonic" WW or WZ decays a nightmare.

The authors of the CDF analysis, however, sought precisely that process, knowing that spotting it would be a very good starting point to searches for the Higgs boson. A light Higgs particle is in fact produced with a non negligible rate at the Tevatron together with a W boson, and it decays in a pair of hadronic jets. The same signature arises, with some important differences that I will discuss later. WH production is much rarer than the rare WW and WZ processes, so one does not expect to run into the former when searching for the latter; but one may always dream...

The WW/WZ search (you cannot distinguish well the hadronic decay of a W from that of a Z, so the WW and WZ signals mix and are studied together in semileptonic diboson searches) was fruitful, and in fact two separate analyses were able to extract the semileptonic WW/WZ signal from the data in the last couple of years, after a sufficient statistics of collisions had been made available. I wrote about the searches in past articles, and I will place here a link as soon as I dig them out of my 2000+ posts... Let us however discuss the present signal here.

What's that bump down there ?

The authors saw that their background to WW/WZ events was not modeled very precisely, and investigated the high-mass region with more care. They were in for a surprise. The bump was not a ephemeral feature of the data, and simulations of the background could not explain it away. This started a much deeper investigation, which eventually led to the paper you can read on the preprint arxiv today.

I will now describe the analysis in some detail, but experts are better advised to read the paper themselves. The analysis is not too hard in itself, although the interpretation of the result eventually is.

The search starts with the so-called "high-pT lepton datasets": samples of events collected by requiring online that the detector sees a energetic electron or muon. These samples contain sizable fractions of W decays, and the signal can indeed be extracted with methods we learned from Carlo Rubbia almost 30 years ago: require that the lepton is energetic (rejecting some events where the electron or muon is faked by other particles) and isolated (to remove events where the lepton is due to the decay of a bottom or charmed hadron); and require that there is a significant amount of "missing transverse energy", an imbalance in the total energy flowing out of the collision in the plane transverse to the beam direction. In W decays missing energy is due to the escape of an energetic neutrino, while backgrounds produce missing energy only if the measurements of all particles in the calorimeter conspire to produce an asymmetry.

After those standard selection cuts, the data is already quite pure in W decays. But what about the other one ? This is selected by enforcing the presence of two hadronic jets. The background, at this point, has the rather dull name of "W plus jets" production; some residual background from top quark pairs is also present, along with events where the leptonic signal is a fake and the event is actually a multijet one.

To purify the sample, some additional inessential cuts are applied; the one to cite is perhaps the requirement that the missing energy points away from the jets in the transverse plane, a fact that removes multijet background where the missing energy is due to jet energy measurement errors. At this point, one can already compute the dijet invariant mass, and try to figure out what is the reason for its shape.

The comparison is made with Monte Carlo simulations of the predicted background processes. Together with W+jets, ttbar, and multijet events, one sees a enhancement due to the presence of the original searched signal, namely WW/WZ semileptonic decays. You can well see them in the figure below.

The red histogram is the diboson signal, which is evident in the data: it peaks at 80-90 GeV, where the hadronic W or Z decay contributes. Note also that the W+jets background totally dominates. Note further that there is a mismodeling of the data in the region above 120 GeV: it seems like all data points are displaced by one bin toward the right with respect to background predictions.

So, the data overshoot the backgrounds in the region 120-160 GeV. First of all, we can ask ourselves a very basic question: is the effect statistical or systematic ? Of course it is systematic: the data points are all higher than the predicted background, and by a significant amount. But what systematic effect is the cause of this mismatch ? Different options are on the table.

Sources of the mismatch

The first thing a reasonable physicist might hypothesize is that there be an error in the jet energy scale which is set in the Monte Carlo. Imagine that when a 50-GeV parton from a real collision hits the detector it is reconstructed as a 50-GeV jet, but that when a simulated 50-GeV parton does the same, 45 GeV are estimated: this is a "energy scale" error, and it would cause the dijet invariant mass of the estimated backgrounds to peak and be displaced by 10% leftward, as is observed.

It goes without saying that if this were still an option on the table, CDF would have not published the paper we are discussing. The jet energy scale is studied with excruciating detail at the Tevatron, and the uncertainty on it is below the percent level(at least for quark jets). Top quarks are by now a very good source of calibration for the variable, since they contain W->jj decays with which one can verify whether the jet reconstruction in the data and in the simulations agree. So jet energy scale does not seem a likely candidate for the mismatch.

The second possibility is that one of the backgrounds is mismodeled, either in shape or in normalization. If you looked carefully in the figure above, you would see that the shape of the QCD background (the "multijet" processes, those that do not yield a real electron or muon) has a different shape from the W+jet background. So if one had underestimated the QCD background, maybe the total dijet mass would disagree with the data in the region where the background is falling...

It is unlikely that the multijet shape has been so heavily underestimated, though. To make up for the observed excess in the 120-160 GeV region, one should hypothesize that the QCD multijet fraction has been underestimated by 100% or so: and since the QCD fraction is determined from the data -it is the only background component for which the experimenters did not trust the simulation, in fact- one can hardly believe that this is an option.

The third possibility is that this is just a mismodeling of the main background, the W+jets "green" component above. This is unlikely, but it remains the most likely hypothesis. The authors did try to deform the W+jets dijet mass shape by changing parameters in the simulation, making different assumptions, reweighing events based on some other kinematical features of the jets; but systematics remain rather small.

And what if ... ?

So there is a fourth option on the table: there might be a new particle, a massive body with mass in the 140-150 GeV ballpark, which contributes to the data sample. This particle would be created in association with W bosons, and decay primarily into jets. The latter hypothesis is necessary because if one were to allow the particle to also decay into leptons one would find a logical inconsistency with the very precise measurements that CDF (and DZERO) have made in Run II of their fully leptonic datasets.

So let us have a look at a fit of the dijet mass spectrum which includes the Gaussian signal from a narrow resonance at 140 GeV. Mind you, I say "narrow" because I am implying that the width of the excess seen in the data is roughly the same as the one observed for the W/Z peak at 80-90 GeV, once rescaled to account for the change in resolution in going up by 50 GeV. Those two bosons have natural widths of 2.1 and 2.5 GeV respectively, so that their mass shape is almost perfectly Gaussian (as opposed to a Breit-Wigner resonance shape): the jet energy resolution effects dominate over the natural width. What I call "narrow" is therefore something whose natural width is small with respect to jet resolution effects.

This fit is much better than the previous one, of course -we added degrees of freedom! There are statistical estimators that allow one to draw a conclusion from the fact that a Signal+Background fit is much better than a Background-only one. The authors studied those estimators, and were able to determine that the statistical significance of the feature in the data corresponds to a deviation of 3.2 standard deviations, in Gaussian approximation. What that means is that it is very unlikely -one in a few thousandths- that the fluctuation is caused by statistical effects alone.

But is it an observation ? Well, first of all, an observation of what. This particle is not called for by the most en vogue models -it cannot be a Higgs boson, because if a Higgs boson were sitting at 140 GeV with such a large production rate we would have seen it decay into WW pairs a long time ago. Furthermore, the Higgs would mostly decay into bottom quark jets, but these jets are not b-tagged -if they were, this would be similar to the analysis of WH production, which does not see any excess.

So, no Higgs. But it might be another fancy beyond-the-standard-model particle, right ?

Well, of course. There are so many models that extend the standard model beyond our current observations, that one can easily find a few that fit the extra particle. Take this paper, for instance. A light Z' boson with suppressed couplings to leptons might fit the bill.

In the paper (which appeared five days ago in the arxiv), the Z' model is fit to the data of CDF, which were actually presented in the PhD thesis of Viviana Cavaliere a while ago. This is a theoretical paper of the class of "instant preprint" which we have seen flourish with the first LHC publications... It is a sign of the times. The figure on the right is not qualitatively different from the one in the CDF publication, but it is a real new physics model which produces the blue bump, and not just a Gaussian fit. So one needs to take it seriously.

And there is, in principle, another reason for taking the Z' hypothesis seriously. The authors of the new theoretical paper claim that the Z', added to the standard model, produces a asymmetry in top production which is in the right direction, explaining away some of the discrepancy that top-antitop asymmetry analyses have seen in the recent past at the Tevatron. You can check that out in the figure on the left, where the Z' model produces the asymmetry prediction shown with a blue line. The CDF data are black points, and the SM prediction is the dashed line.

All in all, this is very intriguing, of course. I however have my own ideas. One of the things that I am not convinced about is the quality of the agreement of the background shaoe with the data in regions away from that where the tentative new signal arises. If you look carefully, you will see that the peak of the mass distribution, where the WW/WZ signal resides, is not well modeled. I already mentioned that above, but here I stress it to explain that one might go wrong with the dijet mass shape not just if one commits an error in the estimate of the jet energy scale, but also if one simulates jets with different kinematical properties. The angle between the two jets influences the mass distribution, and in fact this is discussed in some detail in the CDF paper. I do not find the result of that investigation conclusive enough, however, so I must keep my idea that one of the most likely explanations for the observed 120-160 GeV discrepancy is that the W+jet background is not modeled well enough by the simulation.

I add to that a consideration. The excess observed by the authors in the mass spectrum is mostly due to electrons (156+-42 events versus 97+-38 in the muon sample), and the electron sample is potentially richer with QCD background. While the two numbers are not inconsistent with one another, they leave one wondering...

In conclusion

I do not particularly like to play the die-hard sceptic -this is after all a paper I myself reviewed and signed!- but I believe this is nothing but the umpteenth would-be new physics signal, destined to be buried by the analysis of further data, by the crafting of more precise simulations, or by the better understanding of Standard Model sources. Nevertheless, it is quite interesting to see this paper coming out now. Both DZERO, and the LHC experiments ATLAS and CMS, have now a lead to investigate their own data! If they were to see a 3-sigma effect in the same mass range and in the same kind of events, it would be already time to put the champagne in the fridge....

Older post:

This post is for now just a placeholder of an article I am publishing after 5PM Fermilab time, to report about a new find by the CDF collaboration. Come back and reload in a few hours and you'll get more information... The reason is that I have promised I would wait to blog on this until a seminar takes place (at 4PM at Fermilab), but I wanted to let you know that the result is now public, such that I can even attach a plot below, one which is already available online.

As you see in the background-subtracted plot above, besides the peak due to W and Z bosons in red, there seems to be a resonance at 150 GeV or so (fit by the blue histogram), one that decays into pairs of jets and that is produced in association with a W boson in proton-antiproton collisions. The quoted significance of the Gaussian bump is 3.2 standard deviations, once look-elsewhere effect and systematics are accounted for. Yes, I know, this smells of Higgs, but it cannot be... For reasons I will explain later today.

And to give away what I think in world premiere: the answers to the questions I placed in the title of this post are "No. No."

Further reading:

Peter Woit discusses this shortly
Lubos Motl with more links
(apologies to both for having their names in the same paragraph)
Resonaances discusses the latest news on the related topic of ttbar asymmetry
He also talks about this signal now
Michael Schmitt has a very in-depth discussion of the signal.
Physics and Physicists
Sean Carroll
Flip Tanedo
Philip Gibbs
Arcadian Pseudofunctor
Marco Frasca

Also see:
Physics World
CBS (which quotes this post)
New York Times online
Science news
20 minutes (in French)
More coming... Write more links in the comments thread!