Machine Learning For Phenomenology

These days the use of machine learning is exploding, as problems which can be solved more effectively with it are ubiquitous, and the construction of deep neural networks or similar advanced tools is at reach of sixth graders. So it is not surprising to see theoretical physicists joining the fun. If you think that the work of a particle theorist is too abstract to benefit from ML applications, you better think again.

Think again - that's exactly the aim of the workshop I have attended the past few days in Durham. It was probably meant to be titled "Machine Learning for Phenomenology", but the organizers also called it "for Theory" in their advertising material, so nobody really knows what the true exact title of the workshop was - that's an irrelevant detail, thus worth mentioning here.

In order to think at possible applications of ML to particle phenomenology, and to offer interesting ideas, presentations were given by a few experts in the field as well as by some additional regular Joes who happened to have had some cool idea on potential applications. I belong of course to the second category.

My presentation was indeed centered on a good idea rather than on technicalities of ML tools: how clustering algorithms (which belong to the class somewhat deceivingly called "unsupervised learning") may be used to partition the often highly multi-dimensional parameter space of new physics theories in an "optimal" way, in the sense of lending itself optimally to experimental investigation. For more on that see my presentation, or the article I published with a few collaborators a couple of years ago.

The discussions were lively, also fostered by the relaxed pace of the agenda (which you can find here, if you are really interested). Among the things I found the most interesting was Bryan Ostdiek's talk. Bryan is a cool guy from Oregon University, who flew in from there for this workshop and the next one (the famous "IML" CERN workshop that takes place next week, and on which I will certainly blog soon). He allowed me to realize that the idea of "weak supervision" is a really cool thing, which may offer solutions to data-driven classification problems. While I do not wish to go in the details, let me explain what this is about.

The idea of weak supervision is that you train a classifier with data which are only partly labeled: in layman terms, rather than showing your algorithm "signal events" and "background events", such that it may learn to discriminate the ones from the others, you show them two sets which differ because of different signal contaminations: say the first contains 60% signal, and the second only 20% of it. A theorem (ok, if you ask, one based on Neyman-Pearsons' lemma and on the transferability of inference from the likelihood ratio to any monotonous function of it) proves that the classification power you get from those "partly labeled data" should be the same than if the samples were homogeneous! That, however, is true only "asymptotically", i.e. when you have infinite amounts of training data. Still, this is quite cool, and I am sufficiently intrigued that I think I will play with these ideas in the near future.

I should also like to report, if only for my own record, that after the end of the talks on the last afternoon there was a hilarious moment as Frank Krauss instructed the participants on how to spend their free time before the conference dinner. Using an enjoyably colourful language he pointed out a list of good bars along the way to the restaurant, eloquently appraising their relative merits. The audience was elated, me included.

Think again - that's exactly the aim of the workshop I have attended the past few days in Durham. It was probably meant to be titled "Machine Learning for Phenomenology", but the organizers also called it "for Theory" in their advertising material, so nobody really knows what the true exact title of the workshop was - that's an irrelevant detail, thus worth mentioning here.

In order to think at possible applications of ML to particle phenomenology, and to offer interesting ideas, presentations were given by a few experts in the field as well as by some additional regular Joes who happened to have had some cool idea on potential applications. I belong of course to the second category.

My presentation was indeed centered on a good idea rather than on technicalities of ML tools: how clustering algorithms (which belong to the class somewhat deceivingly called "unsupervised learning") may be used to partition the often highly multi-dimensional parameter space of new physics theories in an "optimal" way, in the sense of lending itself optimally to experimental investigation. For more on that see my presentation, or the article I published with a few collaborators a couple of years ago.

The discussions were lively, also fostered by the relaxed pace of the agenda (which you can find here, if you are really interested). Among the things I found the most interesting was Bryan Ostdiek's talk. Bryan is a cool guy from Oregon University, who flew in from there for this workshop and the next one (the famous "IML" CERN workshop that takes place next week, and on which I will certainly blog soon). He allowed me to realize that the idea of "weak supervision" is a really cool thing, which may offer solutions to data-driven classification problems. While I do not wish to go in the details, let me explain what this is about.

The idea of weak supervision is that you train a classifier with data which are only partly labeled: in layman terms, rather than showing your algorithm "signal events" and "background events", such that it may learn to discriminate the ones from the others, you show them two sets which differ because of different signal contaminations: say the first contains 60% signal, and the second only 20% of it. A theorem (ok, if you ask, one based on Neyman-Pearsons' lemma and on the transferability of inference from the likelihood ratio to any monotonous function of it) proves that the classification power you get from those "partly labeled data" should be the same than if the samples were homogeneous! That, however, is true only "asymptotically", i.e. when you have infinite amounts of training data. Still, this is quite cool, and I am sufficiently intrigued that I think I will play with these ideas in the near future.

I should also like to report, if only for my own record, that after the end of the talks on the last afternoon there was a hilarious moment as Frank Krauss instructed the participants on how to spend their free time before the conference dinner. Using an enjoyably colourful language he pointed out a list of good bars along the way to the restaurant, eloquently appraising their relative merits. The audience was elated, me included.

## Comments