These days I am in the middle of a collaborative effort to write a roadmap for the organization of infrastructures and methods for applications of Artificial Intelligence in fundamental science research. In so doing I wrote a paragraph concerning benchmarks and standards. I thought this could be interesting for readers here, so I reused the text and inserted some explanations. It serves as a reminder of what I wrote initially, as the paper is a collaborative one so the text will then undergo many other iterations and change significantly...
Benchmarking and Standards for AI in Fundamental Science
1. Introduction
A large variety of machine-learning-powered methods have become available in the past two decades, and they have been further tailored and used by the community of fundamental scientists in inventive ways for the extraction of optimal inference from complex datasets, as well as for optimization of instruments, reconstruction and pattern recognition procedures, and in general for all their experimental activities.
This breadth of available techniques is an asset as well as a liability: in particular, to a user faced with a specific new problem it may not be immediately clear what should be the most efficient and effective tool to be employed; this may cause a significant increase in the development time of solutions and in the time required to reach optimal results.
A way to address the above difficulty, which has a potentially very significant impact in the person-years invested in development and tuning of models, is to develop and collect a knowledge base on the most advisable tools appropriate for different classes of tasks which are common in fundamental science experimentation. This is of course a complex, long-term endeavour, which can only be successful if the whole community contributes to it.
2. The Higgs Kaggle Challenge: An example to follow
A way to expedite the collection of an initial core set of preferred techniques matching different typical problems is to organize community-wide challenges centered on carefully chosen problem stipulations - these should be of wide interest, and of medium to high complexity, in order to attract wide participation in the development of proposed solutions.
A prime example of this kind of activity is the 2014 Higgs challenge, organized on the kaggle platform by ATLAS members (reference). A total of over 1700 solutions were submitted to the problem of correctly classifying Higgs decays to tau-lepton pairs seen in a large collider experiment, for which the organizers had provided simulated signal and background data for training of models and for their testing.
A massive amount of information could be drawn by comparing the performance of all the proposed models; of particular significance is the fact that the interest of such a challenge, and the associated money prize, attracted the participation of many computer scientists. In fact, the winning solution came from one such user, Gabor Melis. The winning solution demonstrated the power of pooling of different models and of stringent cross-validation techniques for effective classification power in high-dimensional spaces. These kinds of lessons are extremely useful for the community, and they indicate that something like the Higgs kaggle challenge could be a way to harvest the data we need.
→ Proposal: The way forward could therefore be for the community to define a set of typical use cases that cover as large a part of the core activities of data analysis and related problems in fundamental physics, and choose one benchmark for each of these use cases, which becomes a testing ground for possible models and developed solutions.
3. Developing surrogate models for faster investigation of parameter spaces
Another possible organized activity related to the above mentioned one, and one that our community could benefit from in the development of solutions to our complex data reduction problems, is the creation of a library of surrogate parametrizations which can be useful for quick studies of detector concepts as well as a first step in the development of optimization pipelines for end-to-end modeling.
Surrogate models of the physics processes that take place when radiation (e.g., particles produced in a proton-proton collisions at the Large Hadron Collider, or extended atmospheric showers produced by energetic cosmic rays impinging in the atmosphere) interacts with matter (our detector environment) address the task of simplifying the complexity of the stochastic data generation, thereby bypassing the high-CPU-demand of full simulations provided by GEANT4 or CORSIKA, while adding differentiability to the processes, which crucially enables gradient descent functionality and enables the optimization of reconstruction procedures and/or detector geometry.
A significant number of novel applications of gradient-based optimization to detector design and co-design of hardware and software of our experiments have indeed been produced in the very recent past. These developments are showing how we should plan in a principled, cost-effective way our future experiments in fundamental science. The acquired experience shows that a significant part of the development process of these solutions is spent in constructing suitable surrogate models. A good part of the development work to put together these surrogates does not need to be re-invented every time, because of the common denominator at the basis of the data generation procedures. Here, too, investigation of the space of machine learning solutions would provide additional precious inputs to the developers of future models.
→ Proposal: The community should encourage researchers developing surrogate modeling to share (e.g. in github) their models, and to ensure that these models are excellently documented, easily re-usable, and quickly liable to be re-purposed for the study of similar use cases.
Comments