Working as an experimental particle physicist in a large scientific collaboration, such as the 3000-strong CMS experiment at the CERN LHC, is a (not too uncommon) privilege, for several reasons. 
One of those reasons is of purely numerical kind: the number of publications that bear your name grows by the day, and may reach four-figure values in the course of a couple of decades (I am about to cross that point with my publication list, in fact). But what value do those thousand articles have for the sake of assessing your value as a scientist ? Very little, indeed, and in fact all the selection to which I have participated in my career required one to specify one's specific contribution to all the papers one wished to boast about.

There would be a whole book to write about the logic behind the practice of putting three thousand names on every publication CMS or ATLAS produces, but this post is actually about physics, so let me rather dispose of this introduction by getting to the point. 

What I need to say is that for theoretical physicists the situation is quite different - they are not guaranteed to get to sign articles by belonging to a collaboration. They do attempt to create groups and ties that allow them to sign more papers than ones they physically and personally edit, but for a respected theorist these days an excellent goal is to reach the count of a hundred authored papers. Yet those papers have arguably "more weight", in the sense that they carry few signatures - something which automatically implies that one's personal contribution to the work is significant.

So today I am very happy to boast here about a new phenomenological paper I wrote together with three post-docs (Alexandra Oliveira, Mia Tosi, and Florian Goertz), a graduate student (Martino Dall'Osso) and a undergraduate (Carlo Gottardo); the latter two are students I am an advisor of. The article, titled "Higgs Pair Production: Choosing Benchmarks with Cluster Analysis", is a preliminary version of the text we will soon submit to JHEP, a prestigious and high-impact-factor journal. But what is the study about ?

The 2012 discovery of the 125 GeV resonance we all call the Higgs boson has left many questions on the table. Is it the only Higgs boson or the first of several such states ? And does it behave as the Standard Model predicts, or can we detect some anomaly in its detailed properties ?

Of course, new physics could manifest in a number of ways at the LHC in Run 2, the new period of data taking at the higher 13-TeV centre-of-mass energy which has just started. But in the absence of anything new to concentrate on, for the time being it is a very wise thing to try and answer the above questions about the Higgs boson, the only true novelty that Run 1 offered.

Our study

Our study addresses a fundamental issue in the study of the detailed properties of the Higgs boson: by studying the production processes leading to the simultaneous production of two of these particles, one has exclusive access to some of its most private properties - the self-couplings, for instance. The production of Higgs pairs is very hard to observe if the Standard Model is correct, as then the rate of these events is extremely low. But anomalous couplings (which mean new physics beyond the SM) may bring up the rate to a level which makes the process observable in a not distant future.

The problem is the following: there are a number of possibilities for enhancing Higgs pair production via anomalous couplings. One can picture the situation by introducing five additional parameters in the Higgs Lagrangian density. The precise value of those parameters affects not only the rate of pair production, but crucially, also the kinematics of the resulting events. 

When designing an experimental search for Higgs pair production, one is then facing a quandary: should one first go after the regions of this five-dimensional parameter space that predict the highest production rate, guaranteeing a sure early result (an exclusion of those narrow regions in the extremely wide parameter space) albeit a very specific and not very general one; or should one rather design a smallish set of experimental search strategies that can, taken together, "illuminate"
the largest possible area of the unknown parameter space of new physics theories ?

The first approach - search for easy target first - has been the one routinely adopted for many new physics searches in the past. Take Supersymmetry as an example: the vast SUSY parameter space has been excluded in patchwork fashion, by looking first for the very evident things and then progressively closing in on the subtler signatures. I call this the drunkard's watch approach, as it reminds me of the drunkard who lost his watch in the street at night, and searches for it under the street lamps. Sure, if it is there he's going to find it - but this is no strategy for a complete search.

The second approach is the one we advocate in our paper. As I mentioned, the kinematics of Higgs pair production is extremely varied as one moves around in the five-dimensional space spanned by those new physics parameters. But in contrast to SUSY, the final state is always the same: two Higgs bosons, that decay exactly in the same way whatever the value of those parameters (to first order). This homogeneity of the final state, together with the wide variation of the kinematics, call for a clustering approach. Let me explain why.

Imagine you search for Higgs pairs in events with four b-quark jets in the final state -i.e. events where each Higgs produced two b-quark jets in its decay. This is the most common final state, as the branching fraction of Higgs decays to b-quarks is the largest. Yet what chances do you have to distinguish the signal from the extremely large background of 4-b production by strong interaction processes, which starts off at rates which are many orders of magnitude larger ? Slim chances, unless you exploit the kinematics of the production process. 

By tuning a multi-variate classifier trained to distinguish the signal kinematics from the background one, you have a chance of evidencing a signal. But as the kinematics of the signal strongly depends on the value of the anomalous coupling parameters, you will have to devise a gazillion different searches to cover all possibilities! Unless... Unless you focus on the most "typical" points of the parameter space, whose kinematics is "most similar" to a large number of nearby parameter space points.

What we did was therefore to study and categorize the kinematics of Higgs pair production, considering a grid of about 1500 different points in the parameter space, chosen wisely to not only cover the space, but also to concentrate on the regions where the kinematics exhibits the fastest variations. Then, with a clustering procedure, we grouped parameter space points according to the similarity of the final state kinematics they predicted. 

The result is a list of a handful of "representative points": by studying those benchmarks, an experimentalist is guaranteed to be covering the large parameter space optimally. You can well understand how this strategy is entirely orthogonal to the one of the drunkard watch's approach, as it neglects the opportunistic approach of looking first for the easiest things (the parameter space points yielding the largest rates). We are convinced that, as we set out to search for Higgs boson pairs in the next decade, a principled approach and a wise definition of benchmarks which does not depend on the amount of integrated luminosity you have at hand today, but looks into the future and plans a systematic investigation, is the way to go.

Some more detail for the five of you who are still around

Below I offer some more detail if you really are interested. In the figure below you can see the invariant mass distribution of the Higgs pairs for parameter space points grouped into 13 clusters. The red distribution is the identified "benchmark": a Poisson Likelihood test statistics identifies it as the "most similar" sample to all others of the group. You may notice how the 13  benchmarks are quite different from one another, while they sort of "represent" well the kinematics of the other distributions within each cluster - to the level of detail that is experimentally accessible.

And below, you can see some "slices" of the five-dimensional parameter space, showing with different markers and colours the parameter space points as they have been clustered into the 13 groups.

The study offers a definition of Higgs pair production benchmarks which we hope will be used by experiments in their searches in the coming years. As for CMS, as I will participate in those studies, I have leverage to convince my colleagues that it is a sound choice; as for ATLAS, it would be great if they also conformed to this procedure, as it is always a good thing if both experiments agree on benchmarks. But then, diversity is a point of strength in the study of fundamental physics, so maybe a different approach is also advisable, in parallel.