Last Monday and Tuesday I gave a few lectures on Machine Learning at a Data Science school (IDPASC) in Braga, Portugal. I think that this topic has received so much attention in the last few years, with heaps of excellent resources now freely available online, that it is very difficult to be original and provide useful information to any student who is proactive enough to google "auto-encoders" by herself.

So what did I have to offer that could be of interest and stimulating to the 80 students who attended the school? I tried to leverage my own background and the personal way of viewing the available tools, and to give practical advice on how they could orient themselves in a field that is really brimming with different exciting ideas. And I tried to focus on what is of relevance to research in fundamental physics, as that is what was originally my personal motivation for learning some of the tricks of this relatively new branch of computer science.

Here I cannot of course replicate the lectures, or even provide a summary. You can find my slides online at the school web site if you are curious. What I want to do below is to nit-pick and offer you a bit of wisdom you might find interesting about the whole subject, taken from the material.

The issue I want to discuss here is: Why is classification really so important ?

Machine learning includes methods that address a very large number of different tasks, and classification is only one of them. So why do we insist so much in discussing it in detail?

First of all, machine learning indeed addresses at least three main very different topics. The first one is called "supervised learning". In supervised learning, the algorithms have to learn structure in data which are "labelled". What it means for an example (or event, or observation) in the data to be labelled is that together with a vector of its observed features X, the event comes equipped with a corresponding value Y, corresponding to the target we want to learn: we write it as (X,Y). 

In supervised classification Y could be a class label (such as e.g. "cat" or "dog", if you are after classification of images in different subsets; or "signal" or "background" if you are searching for a new particle in LHC collisions data); in supervised regression, Y will instead take on a continuous value, and the task is to learn to predict Y* from each of a series of unlabeled examples for which only X* is known.

The second topic is "unsupervised learning". Here the data X do not come with a label that specifies their type: in unsupervised learning the algorithm is tasked with a much fuzzier goal of interpreting the data finding structures in them, or finding structure in the features that characterize each observation X. For example, given some data the algorithm may be able to identify that it is comprised of different classes; this is addressed by clustering algorithms. Or the task might be to identify some anomalous example in an otherwise homogeneous set. Finally, an important task which belongs to this area is density estimation. If you only have examples taken from a unknown distribution, the problem of determining its shape is complex and rich.

The third topic is called "reinforcement learning". Here the algorithm must try and devise a strategy to maximize some reward, given some possible actions and a continuous stream of feedback from the system. A very nice example of this is the attempt to let a simulated human-like structure of joints and muscles, subject to all known physics laws of dynamics, learn how to walk by providing a "reward" if the system can move itself in some direction. The video below shows how this task can be learned effectively by a smart enough system.

Now, in this forest of different application, classification really has an important role. How to explain why ? One could try and make the point that the task is important because of its clear definition and the availability of easy metrics to ascertain the performance of the possible solvers. 

Or one could equally well explain how fundamental science has always made great leaps forward by performing classification of the observed natural phenomena or entities: Mendeleev's table is a prime example, when the russian scientist 150 years ago was able, by tabling chemical elements by their atomic weight and valence (their behaviour in forming compounds), to identify holes in the table, and thus predict the existence of new as of yet undiscovered elements. But there are heaps of other possible examples, from the classification of species to the quark model of hadrons.

My take is that classification is important for a different reason. If we think at what learning really is, we are bound to identify analogies as the real building blocks of our process of understanding the world, learning to talk, to move around without falling down, or whatever else we slowly master in the first years of our life. We really do learn by analogy: we understand the behaviour of new or unknown objects, systems, or phenomena by analogy with things we have already experience of. New behavior can be predicted in unknown systems by analogy with known behavior in systems we are already familiar with. 

At the heart of the process of constructing analogies what do you find? The classification task, of course. The very first, and crucial, step in the process of creating an analogy, and thus learn about the world we experience with our senses, is the classification of unknown systems in their equivalence class. 

So I argue here that classification is the real building block of our learning process. If we are tasked with constructing an artificial intelligence, classification must be one of the crucial ingredients. Of course, this does not mean that one can spend one's life working on different ML problems. There are in fact so many absolutely exciting problems to attack, that focusing only on one of them is indeed reductive. But the central role of classification makes it a topic which must be discussed in detail in any self-respected lecture on machine learning. So that is what I did in the six hours of lectures in Braga.


Tommaso Dorigo is an experimental particle physicist who works for the INFN at the University of Padova, and collaborates with the CMS experiment at the CERN LHC. He coordinates the European network AMVA4NewPhysics as well as research in accelerator-based physics for INFN-Padova, and is an editor of the journal Reviews in Physics. In 2016 Dorigo published the book “Anomaly! Collider physics and the quest for new phenomena at Fermilab”. You can get a copy of the book on Amazon.