ATLAS Awesome Flavour Tagging Algorithms

The title of this post is not of my making - it is something you may read in a list of recent ATLAS results, in one of the otherwise dry and business-like web pages of the experiment:

Don't get me wrong, I am all for a bit of personality in such web outlets, so the above rather than criticism should be seen as an exhortation to my CMS colleagues (as CMS the experiment I am a member of) to mimic its competitor. I look forward to a listing of "CMS wondrous new results on Higgs physics", e.g. ...

Anyway, after noticing that adjective, I could not help having a closer look. But what are Flavour Tagging algorithms? Let me give some context here.

The ATLAS detector

ATLAS is, along with CMS, one of the two multi-purpose experiments deployed at the Large Hadron Collider, CERN's marvelous particle smasher. A marvel in itself, ATLAS is a seven-storey tall detector which sits 100 meters underground in a huge cavern. Every 25 nanoseconds, millions of electronic channels record the result of the interaction of elementary particles through its detection elements. Those particles are produced in hundreds by proton-proton collisions delivered by the LHC. They say a picture is worth a thousand words, and I think it applies here, so let me cut the description short:

Wow, isn't it great? A giant built out of cutting edge electronics, meant to take pictures of the most energetic collisions we have ever had access to. I advise you to organize a visit to CERN to see ATLAS and/or CMS if you have a chance. But let us move on, as ATLAS is only tangentially the topic of this post.

Flavour Tagging

Flavour Tagging is an analysis technique meant to identify what was the originator (a quark or gluon) that produced a given stream of particles - a "jet" - in the detector. As protons are made of quarks and gluons, it is only natural that their energetic collisions kick out in different directions these constituents. Quarks and gluons, however, cannot be extracted by themselves from their parents: when you pull one out at high energy, you generate a whole bunch of particles made themselves of quarks and gluons: these we call collectively "hadrons". Protons and neutrons are hadrons (they contain three quarks each), but so are pions and kaons (which contain a quark and an antiquark). Hadrons made of three quarks are also called "baryons", while quark-antiquark pairs are called "mesons". Ok, so much for terminology.

As quarks come in six different flavours (up, down, charm, strange, top, and bottom), it is only natural to ask ourselves, when we observe a jet of hadrons, what was the flavour of the originating quark. This is a very important question, in fact, as by sorting this out we can very effectively isolate collisions of high interest to us, such as ones where a Higgs boson was produced.

The Higgs boson predominantly decays to a pair of bottom quarks, which are very rarely produced in generic collisions (the bottom quark is heavy, and so it is more rare to appear), so if you need to find Higgs bosons and you have a means to recognize bottom quark-originating jets, you select a pair of those. Mind you: this will not suffice to find the Higgs, as a Higgs boson is produced only once every 3 billion collisions at the LHC, while a pair of bottom quarks of high energy pops up in one every million collisions or so; but you are on the right track for sure.

So, how does Flavour Tagging work? The ATLAS paper in question is a good way to start if you like digging in the details. Otherwise, I can quickly summarize the main points for you below. In general, you can recognize the jets originated from bottom quarks because these quarks are heavy - they weigh 5 times a whole proton. The other reason why they are somewhat easy to spot is that they create, within the jet they produce, a hadron made up of a bottom quark and another quark or two (a meson or a baryon), which invariably has a long lifetime.

The long lifetime of bottom hadrons is a truly amazing thing in the subnuclear world. Being very heavy, that quark knows dozens of ways to distribute its energy (equal to its rest mass) to lighter particles it could decay into. And in subnuclear physics there is a rule that would encourage that: a rule already found by Enrico Fermi in the forties, which says that the speed of the decay should depend on the fifth power of the available energy. So a hadron weighing 40 times more than another (if we e.g. compare a B hadron to a pion) should decay 100 million times faster!

But bottom hadrons won't decay so fast. In fact, they live a whole picosecond or more! A picosecond is an eternity for a subnuclear particle. During that time, a fast-moving B hadron may travel several millimeters before decaying. When it does, it produces many energetic particles (at least two or three), which we can track with high precision if we endow the core of our detectors with silicon pixel sensors. That will do the trick.

And what about lighter quarks, such as charm? Charm weighs only a third of bottom, but it decays much faster - but still slow enough that the hadron it produces can sometimes still be recognized by the distance it traveled. Once we started to become capable of distinguishing charm quarks from bottom quark jets, we dropped the term "B tagging" in favour of "Flavour tagging" when we do not mean to talk explicitly of bottom quarks.
[By the way, one funny naming issue when we talk of b-quark and c-quark hadrons is that they are called "B" and "D" hadrons - confusing, I agree. I use the latter capitalized naming below.]

Other quarks are lighter and less conspicuous in their phenomenology, and it is way harder to distinguish, e.g., a strange quark from an up quark or from a gluon. And what about the top quark, instead, which is by far the heaviest? The top quark does not need its own tagging algorithm in normal circumstances, because it produces three distinct, and easily recognizable, different bodies when it decay: a jet (in fact a b-quark jet, easily taggable), a charged lepton, and a neutrino. We would need another post to discuss the top quark, so I'll stop here.

ATLAS awesome flavour tagging

Instead, let us look at the specific awesomeness of ATLAS Flavour Tagging techniques. ATLAS deploys a series of algorithms that exploit the long lifetime, high mass, and large decay multiplicity of hadrons containing bottom and charm quarks, and a two-stage approach: a first one aiming at reconstructing the secondary vertices from the tracks of charged particles, and a second one where this information is exploited by multivariate classifiers.
At the core of flavour tagging algorithms for high-momentum B and D hadrons is the identification and definition of hadronic jets. These are constructed with a "particle-flow" algorithm - one which exploits all the sub-detectors to infer the various constituent particles. Once jets are reconstructed, and the primary vertex it came from is measured, one may study the "impact parameter significance" of the tracks belonging to the jet. This is a quantity defined as the ratio between the minimum distance of the track to the primary vertex, divided by the uncertainty on the same quantity.

Tracks of particles generated by the decay of B and D hadrons do not "point back" to the primary interaction vertex, so they have a large value of impact parameter significance. In the graph below you can see the different distribution of the quantity for jets originated from different quark flavours.

You can see that the distribution peaks at zero for tracks that belong to particles originated in the primary interaction, while if the particles are coming from the decay of B or D hadrons you get long tails to large positive values of impact parameter significance. Putting together this information from several tracks, and combining it with other jet-related information, makes it for high-discrimination flavour taggers.

A number of algorithms exploit the different properties of long-lived B and D hadrons. In the end, the output of these low-level taggers is combined using deep learning neural networks. These are trained by looking at top-pair events and Z decays to quark pairs in some magical 70%-30% mixture. The output of the most performant network, called DL1r, is three-prong - it classifies jets as originated from B-, D-, and light hadrons.

There are a number of ways to study the performance of these classifiers. One may look at how well light-quark and gluon jets are rejected for a given efficiency to select b-quark jets, or c-quark jets, or to study combined quantities. This depends on what one wants to use the neural network output from. Below you can see the rejection factor for different b-jet efficiencies; the ATLAS paper gives much more detail.

The curves show the rejection factor of jets originated by light quarks or gluons, for different efficiency of collection of b-quark jets. For a 70% b-tag efficiency, e.g., you can get rejection factors of 1000! When I started working as a particle physicist, 30 years ago, I recall seeing the first b-tagging efficiency graphs produced by CDF for its search for the top quark - the technology of silicon detectors had just been imported to hadron colliders, so those were pioneer days there. I remember that a 20-25% b-jet tagging efficiency, with rejection factors of the order of 20 to 50, were already quite exciting then (and in fact, they produced the first evidence of top quark production in a 1994 article). Things have improved quite significantly from then!

An interesting graph is the one below, which details how well one can do charm tagging. Doing charm tagging requires one to not only separate charm jets from light-quark and gluon ones, but also, crucially, also distinguish them from b-quark jets. Hence one may put the rejection power to both backgrounds on two axes, and see how well one is doing in terms of efficiency to charm jets.

One way to read the above curves is to say that using DL1r ATLAS may select 40% of charm-originated jets in a reasonably wide energy range, reducing the contamination of light-quark and gluon jets by a factor of 10, and also reduce b-quark-originated jets by the same factor; or accept 20% of charm jets and achieve a rejection of 300 against light-quark and gluon jets, and a factor of 20 against b-quark jets.

All in all, the ATLAS taggers are very effective, and the combination in the neural network classifiers constitutes an enormous amount of work. This work constitutes the foundation on which the analyses looking for Higgs boson decays to b- and c-quark pairs are built, so the researchers and students who pulled it off need to be acknowledged for their heroic efforts. That is why I am all for dubbing these algoritms as Awesome!
---

Tommaso Dorigo (see his personal web page here) is an experimental particle physicist who works for the INFN and the University of Padova, and collaborates with the CMS experiment at the CERN LHC. Dorigo is the President of the USERN organization. He also coordinates the MODE Collaboration, a group of physicists and computer scientists from twenty institutions in Europe and the US who aim to enable end-to-end optimization of detector design with differentiable programming. Dorigo is an editor of the journals Reviews in Physics and Physics Open. In 2016 Dorigo published the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", an insider view of the sociology of big particle physics experiments. You can get a copy of the book on Amazon, or contact him to get a free pdf copy if you have limited financial means.

Related articles

Comments

Know Science And Want To Write?

Donate or Buy SWAG