Integrated luminosity is a measure of the amount of interactions produced by a collider. For protons ran against proton, one inverse femtobarn corresponds to roughly 80 trillion collisions. Of these, the detectors can only store a small part due to data acquisition bottlenecks, but this is not a problem - most of the interactions are absolutely of no scientific value. The majority of them in fact release very little energy: either the protons "slide" elastically one on the other without even disgregating, or they actually compenetrate and traverse one another without much happening.

Physicists are rather interested in the small subset of very energetic collisions whereby a quark or a gluon in one proton hits a quark or a gluon in the other proton "heads on", releasing a large fraction of the total kinetic energy of the system in a form which can then materialize new particles. So the collider experiments are equipped with sophisticated "triggering" systems which can recognize the energetic collisions in a number of ways, flagging the event as worth collecting and enabling the writeout on disk of the corresponding large amount of data produced by the various detector components.

Four inverse femtobarns therefore do not correspond to 320 trillion events to analyze, but luckily to a lot less than that. Here, however, we should be concerned with what we could find in those data. And one thing we might find is additional evidence of Higgs boson decays. As you probably well know, last December the CMS and ATLAS collaboration released preliminary results of their Higgs boson searches (final results have been then published a few months ago). Those results were based on the analysis of data corresponding to the five inverse femtobarns of integrated luminosity collected in 2011 from 7 TeV collisions.

Because the higher the total energy and the higher is the expected production rate of rare particles such as the Higgs boson, the slightly higher centre-of-mass energy of 2012 running makes the data collected until today roughly equivalent in discovery reach for a Higgs boson as the 2011 data. Since the two collaborations together found in 2011 a tentative Higgs boson signal at 124-126 GeV with a local significance of roughly four standard deviations, it is very likely that, if that signal is real, the 2012 data contains a signal of similar statistical power. Note that the new data has been blinded by the experiments: we do not know what is in there, but we will soon !

Now, forgive me for the oversimplification, but if one naively added in quadrature two four-sigma effects, one would obtain a over-five-sigma combined significance: this would then be a discovery by all standards. In principle, therefore, the chance that the Higgs boson is found is already there, with data already safely stored on our multi-tiered storage systems.

I may perhaps be forgiven for my simplified statement by looking at the graph below, which describes the significance level of a Higgs boson in LHC data as a function of the particle mass.

As you can see, for a 125 GeV Higgs boson the combined significance of CMS searches with a 10/fb dataset at 7 TeV collisions (which I have explained above is roughly equivalent to what is already in store by adding the 2011 and 2012 datasets) is just short of five standard deviations. But this is one single experiment ! ATLAS is just about as sensitive as CMS, so the two experiments together certainly have enough sensitivity to grant a >50% chance of a Higgs boson discovery by now, if the particle is there.

Now, of course in the discussion I have been significantly simplifying things for the sake of argument. In reality, one should take into account several factors that may affect the sensitivity of the 2012 datasets; among these are the higher "pileup" of the new data: the higher rate of data collection this year comes from having more protons in the beams, and this causes the interesting "Higgs production" collision to come accompanied by two dozen additional collisions which have nothing to do with it, and whose general effect is to make it harder the reconstruction of the final state particles.

One specific example of the smearing that the pileup causes is in the case of the Higgs decay to photon pairs: in order to reconstruct with precision the invariant mass of the hypothetical Higgs boson in a event containing two photons, experimenters need to make a hypothesis for the exact location on the beam axis of the proton-proton collision producing the Higgs, and this is hard if the Higgs only decays to two photons, because photons do not leave tracks in the detector! The location of the vertex is flagged by the amount of other softer particles that are produced by the protons disgregation when the Higgs boson is created, but if there are twenty such collisions together it becomes hard to correctly pick the right vertex. The occasional mistake leads to a "smeared" mass distribution, which is less peaked and thus stands out less on the large backgrounds. This problem is solved by refining algorithms and smarter analyses, but I hope I have given you an idea of why every run is different, and why the above argument about the new data roughly corresponding to the old data is a handwaving one.

## Comments

"... if the particle is there." If nature is infinite and time is continuous, then nature should contain a Higgs field, or something very similar, in order to allow mass and to impart stability to the quantum fields that occur in nature. However, if the finite nature hypothesis is true, then nature has no need for a Higgs field. If nature is finite, then the Space Roar Profile Prediction is plausible. For those who believe in the Higgs field and the infinite nature hypothesis, consider this question: What is the explanation for the space roar?

http://en.wikipedia.org/wiki/Space_roar

Why have "... the new data ... been blinded by the experiments ..." ?

As you say, "... in the case of the Higgs decay to photon pairs ... the smearing that the pileup causes ... is solved by refining algorithms and smarter analyses ..."

What would be wrong with doing the algorithms/analyses on the data events as they occur and releasing the results

so that the 2012 basic histogram plots can be compared to the 2011 plots

to see whether or not the bumps are in the same places ?

Tony

PS - I know it is only about a month to the ICHEP conference where such 2012 plots will likely be released and I can wait that long,

but

my question is really a socio-political one:

Does announcement at a formal structured conference (as opposed to informal release as data is actually analyzed)

serve any purpose other than to feed the egos of "... the big people in the organization [who] come down to talk to the masses ..." ?

(quote from Doug Sweetser)

a number of factors determine the six-month cadence of reloading analyses. Of course the big shots want to make public announcements in front of as large audiences as possible, but that is not the only reason.

LHC experiments have been producing new results with a unprecedented speed. The data will be looked at only a week or two before the results are made public. This requires a large number of checks and detailed studies. Of course not all analyses are done that way; but for the Higgs, for which we now have a clear hint at 125 GeV, doing things blindly assures that the new result will be pristine and not polluted by involuntary tweaking; it also removes the look-elsewhere effect, for this time we will only be concerned with a small mass window.

Best,

T.

for us a "track" is a set of ionization deposits in the silicon strips of which the inner detectors are composed. Only charged particles leave such hits, so photons do not. Photons only produce electromagnetic showers as they pass by heavy nuclei in the electromagnetic calorimeters. There we measure their energy with high precision, but we do not determine their exact trajectory very well.

Note that a photon may convert into an electron-positron pair early on in hitting a nucleus of the silicon. In that case the resulting pair is indeed tracked, and this provides a better pointing vector to determine the primary vertex. We do use those converted photons in the analysis, of course, but it only happens in a fraction of the times.

Cheers,

T.

I'm curious what you mean by "blinded". Is it just that no one is allowed to look at the data until a certain date before ICHEP? Some experiments are blinded by introducing random offsets into the data which are only known to a few people not doing the analysis and only after the analysis is done is the unknown offset revealed to correct the data. I imagine it would be hard to do this kind of blinding on LHC detectors since every analysis would be affected and there are probably things that a determined individual could do to reverse engineer the offset. Why not just divide the data set into, say, 10 different subsets and have 10 different groups perform the important analyses on the subsets and only at then end run all ten analyses on the full dataset?

Nice question. Indeed, we will look at the 2012 data for Higgs searches (and many other measurements) only shortly before ICHEP. For us "blinding" means different things for different analyses, but the general concept is the same - some portions of the mass spectrum, e.g., are not looked at. We filter those events out so that we do not get them in our distributions, until we have understood backgrounds and optimized the analysis.

Doing 10 analyses on ten subsets would be ineffective I think, because if the "control" samples are also split across the subgroups these have lower statistical power for their predictions; if they instead used all the same data then the combination would be problematic.

Cheers,

T.

Tommaso, as to "blinding", you say:

"... some portions of the mass spectrum, e.g., are not looked at ...".

With respect to the basic histogram of observed events in

the digamma channel from 110 GeV to 150 GeV

how would such "blinding" work ?

Given the basic histogram,

I can see how further analysis could be blinded by giving

different analysts different segments in that region without

telling them exactly where are the segments within the region,

but

I do not understand how to get the basic histogram itself

in a "blinded" way.

If the basic histogram is not "blinded",

then it ought to be easy to just look at it to see where are the bumps,

and to compare bump location with the 2011 histograms,

to see whether or not the 2012 data is supporting the 2011 bumps.

Of course, the results of exactly how many sigma of significance

could be held back by blinding of the sigma-analysis etc,

but

it seems to me that right now a lot of people can just look

at the basic digamma histogram and know whether

the 2102 data supports (or not) the 2011 bumps.

Tony

My comment was not so much about "leaking of information outside"

as

about the internal blinded-analysis process.

Given normal human curiosity is it really true that people on the analysis teams would "... not care at all to "see where the bumps are" ..." and somehow "... avoid doing that ..." ?

It seems to me that effective "blinding" would require "blinding" the basic digamma histogram itself, and that is a process that I do not understand and about which I was inquiring how it works.

Tony

## Comments