The Quote of the Week
    By Tommaso Dorigo | October 19th 2012 11:20 AM | 5 comments | Print | E-mail | Track Comments
    About Tommaso

    I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

    View Tommaso's Profile
    "The problem of averaging data containing discrepant values is nicely discussed in Ref. x [...] It is difficult to develop a procedure that handles simultaneously in a reasonable way two basic types of situations: (a) data that lie apart from the main body of the data are incorrect (contain unreported errors); and (b) the opposite -itis the main body of data that is incorrect. Unfortunately, as Ref. x shows, case (b) is not infrequent [...] the choice of procedure is less significant than the initial choice of data to include or exclude."

    The Review of Particle Properties 2004, p.16.


    "the choice of procedure is less significant than the initial choice of data to include or exclude."

    But the inclusion/exclusion of data effects the average as well. This is a place biases can creep in.
    Never is a long time.
    Can you confirm the citation? I find the Review of Particle Physics, but no Review of Particle Properties. DOI?

    The source is, Section 4.3. "Ref. x" is B.N. Taylor, "Numerical comparisons of several algorithms for treating inconsistent data in a least-squares adjustment of the fundamental constants", U.S. National Bureau of Standards NBSIR 81-2426 (1982).

    Reminds me of a lesson from high school chemistry class about the definition of accurate vs precise.    The teacher used a dart board.  If one's experiment (the darts) hit where all the other darts did it was precise, but not always accurate.  If one's experiment hit the bullseye it was accurate but not precise.  
    In real life, if one's experiments or theories are more accurate, but everyone else gets a different precise result, we assume the outliers are wrong.  
    Science advances as much by mistakes as by plans.
    IMHO, statistics is a tool for answering questions.
    If you chose wrong data and/or method, the answer you get will not match question you wanted to ask.

    If I may offer an analogy, it's like radio - Tuning the receiver to the proper frequency and modulation corresponds to the data choice. The amplification and noise reduction corresponds to the method.

    The method becomes important only if your data really contains the required information.

    Or, according to Douglas Adams:
    The answer is definitely 42. Your problem is that you don't know the question. :-)