"And why do we measure areas with square centimeters ?"
"Because it would be much harder to fit in there round centimeters, silly!"
(From a conversation with my daughter)
The problem of classifying elements of a data set as belonging to one class or another, depending on their characteristics, is a very, very well-studied one, and one which is particularly important in particle physics.
Imagine, for instance, that you collect events with four high-transverse-momentum leptons (electrons or muons) with the ATLAS or CMS detector, and you wish to sort out which of these fit better to the hypothesis of being originated by Higgs boson decay into two Z bosons (with each Z boson in turn producing a lepton pair) rather than to the alternative hypothesis of being due to the incoherent production of a pair of Z bosons -a process that has nothing to do with Higgs bosons. This means you need to classify the data events using their observed features.
This morning I arrived to my office with one idea to develop, and I decided to work on the blackboard that is hung on the wall opposite to where I sit. I seldom use it, but for some reason it seems that writing with coloured markers on that white surface is more thought-inspiring than my usual scribbling on a notebook.
One clear practical advantage of the (white) blackboard is that whenever my train of thoughts hits a dead end or I write some nonsense, I just erase it and start over, keeping the good stuff untouched and still in sight; on the notebook this is not possible, as one needs to turn the page. On the negative side, there is less backward traceability - if I had a good idea and left it alone, it is lost forever.
Statistics data analysis is one of those things that experimental physicists learn along the way. It is not a topic usually included in the curriculum studiorum of physics students at Universities: only few basic ingredients are taught during laboratory courses, and not much is added to that during a typical Ph.D. program.
One usually learns the most common tools to fit histograms, combine measurements, estimate uncertainties on the field, as these things are always needed to produce publishable physics results. But several key statistical concepts often remain fuzzy and obscure in the mind of a large fraction of experimental physicists throughout their career. I know this because this happened to me, too - for quite a few years after my graduation.
[The title of this article comes from a T-shirt with ten advices on what to do when everything else fails]
It has always surprised me to realize how confident we physicists are of the good faith of our colleagues. We may argue endlessly over one graph or result, getting to the point of publically casting doubts on the dexterity or intelligence of our peers (yes, I've seen that), but we never seem to doubt -privately or otherwise- their scientific integrity.