Narsky And Porter: A Nice Review Of Analysis Techniques
    By Tommaso Dorigo | January 31st 2014 03:33 AM | 2 comments | Print | E-mail | Track Comments
    About Tommaso

    I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

    View Tommaso's Profile
    I received yesterday a copy of the brand-new book by Ilya Narsky and Frank Porter, "Statistical Analysis Techniques in Particle Physics" (Wiley-vch 2014), and I would like to offer here my impressions and thoughts on the material.

    The book comes in soft cover (I am unaware if a hardcover will also be available any time soon), and is printed on nice and good-smelling acid-free paper. This might look like a detail to you, but a good part of my judgement about a book comes from the way it smells ;-). The book costs $89 at WILEY, and you can get a good preview of the material and sample chapters at this site.

    The cover shows a CMS heavy-ion event display overlaid to a few typical graphs of multivariate techniques. I should mention that the subtitle reads "Fits, Density Estimation and Supervised Learning". Indeed, the book is geared toward multivariate techniques in data analysis, and cannot be mistaken for a general-purpose book on statistical techniques; despite that, the authors have made an effort to insert in the first two chapters (ch.2 and ch.3, as the first is an introduction) some reminder of the most important techniques. Of course one cannot imagine that this is more than a quick reminder aimed at readers who already know the topics dealt with there: goodness-of-fit tests, which are the main subject of chapter 3, are done with in 15 pages; confidence intervals, in chapter 2, occupy even a few less.

    The main part of the book is a discussion of the many multivariate techniques that exist for advanced data analysis, with a few examples taken from particle physics and astrophysics. Rather than trying to be a deep treatise (there exist others on the market), the book deals with every topic at a level suitable to anybody who is only generically familiar with statistical tools and who wants to learn the basics of methods which most of us have not even ever heard about. For good or for bad, these methods have become extremely important in basic science once computing power has surpassed the required level to easily handle them.

    Every chapter is complemented with a few exercises and a selected list of references. I am not a big fan of books that give you work assignments without offering a solution at the back, or at least some trace of the solution; I hope authors will make an effort to provide worked out solutions to at least a sample of the exercises in their site. That would be a great addition to the material they offer.

    All in all, I believe this is a very useful book for researchers who wish to learn more about the techniques that have become available in the course of the last 20 years to achieve more powerful inference from their data. Not knowing what are kernel estimation, bootstrap, or random forests has become increasingly embarassing if you work in the field.


    Amazon (USA) has a Kindle Edition (ebook) for $59.99

    As you know, Tommaso, I am very backward old-school with respect to advanced statistical analysis of data from colliders etc
    so I am happy to see this book that might help me learn at least some basics about present-day statistical techniques.

    Speaking of data analysis, do you have any idea why the Planck Collaboration has yet to release:
    Planck 2013 results: XXXI. Consistency of the data
    The Planck ESA web site still says "2014 In preparation"


    Hi Tony,

    I am afraid I have no information on the Planck data.

    As for advanced stat methods: the fact is that these techniques are relatively
    new in HEP and elsewhere, since the required CPU has become available only
    in the course of the last 15 years. I myself have experience of developing an
    advanced algorithm in 1992, and finding myself in the situation that I could not
    study it because I had no chance to with the CPU resources available at the time.

    The attitude of old schoolers is not bad, but we have to realize that the suspect with
    which many have been looking at these advanced techniques is less justifiable now
    than it used to be, since the CPU allows one to extensively cross-check that these
    methods, despite being hard to grasp and "visualize", provide indeed a strong boost
    of our analysis capabilities. Acceptance in HEP has been slow and gradual - take
    VISTA and SLEUTH, signature-based inclusive searches of NP at CDF and DZERO in
    the late 1990-early 2000s: they had a lot of problem getting their results approved,
    and even now these methods are not yet well accepted. But they are sound and
    we should look at them with a more open mind.

    Finally: I will contact you in the course of hte next month as I would like
    to get some information from you on the sliwa-dalitz story, which I sort of
    remember you know some details about.