Banner
    A Statistical Issue In The Opera Measurement
    By Tommaso Dorigo | January 3rd 2012 01:28 PM | 33 comments | Print | E-mail | Track Comments
    About Tommaso

    I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

    View Tommaso's Profile
    Update: I have modified the title of this post [originally: "Opera's Statistical Booboo"] and the text below (in places I marked accordingly) upon realizing, thanks to a very good point raised by a reader in the comments thread below, that the idealization I was making of the measurement described below made my conclusions too hasty. Read the text to the end if you want more detail.

    In data analysis, Gaussian distributions are ubiquitous. But that does not mean they are the only distribution one encounters ! In fact, it is unfortunately all too common to find unmotivated Gaussian assumptions in the probability density function from which the data are drawn. Usually one makes such assumptions implicitly, and so these remain hidden between the lines; the result is almost never dramatic, and those few among us who believe that doing science requires rigor are the only ones who are left complaining.

    These days I happen to be putting together a course of basic Statistics for HEP experimentalists, which I will be giving in the nice setting of Engelberg (Switzerland) later this month. Since I want to give a HEP example of the pitfall of the Gaussian assumption, which one needs to avoid in data analysis by having always clear in mind the question "What is the pdf from which I am drawing my data ?", I dusted off one complaint I had with the Opera measurement of neutrino velocities -the reloaded analysis, published one month ago here.

    Opera collected 20 good neutrino interactions from narrow-bunched proton spills at CNGS at the end of October. These 20 events are a golden set because in practice every single one of them is sufficient to measure the time spent by neutrinos to travel from CERN to Gran Sasso: one needs not mess with the proton bunch time profile anymore, since the latter is a 3ns-wide spike in this case. You can see the 20 timing measurements in the histogram on the right.

    Unfortunately, it transpires that the Opera time measurement is subjected to a +-25 ns jitter, which means that they only have 50 nanosecond accuracy on their delta-t measurements. In other words, their measured times are all displaced by a random amount within a box of 50 nanoseconds width. Those 20 events are therefore sampled from a uniform distribution, and all that is left to measure, in order to find the best estimate of the travel time, is the center of that distribution.

    But here comes in the implicit Gaussian assumption: Opera proceeds to take the sample mean (the arithmetic linear average, <t>=1/N Sum_i t_i) of the 20 timing measurements t_i in order to estimate the center of the 50ns box. The horror, the horror. By doing that, they are practically assuming that their data {t_i} are sampled from a Gaussian, because the arithmetic mean is the maximum likelihood estimator of the Gaussian mean (the same holds for other distributions, which I am even less willing to consider as their implicit choice).

    At this point, you could well be a little puzzled, asking yourself what the hell I am talking about: surely, the sample mean is a good way to estimate the location of a distribution, even if non Gaussian, right ?

    Wrong. The sample mean is a very stupid thing to use for a box distribution. The reason is best understood by considering the following alternative procedure: order your data {t_i} from the smallest to the biggest, and take the smallest measurement t_min and the biggest t_max; throw away all the others (18 precious neutrino interactions are thus flushed in the toilet! Can you hear them scream ?). The location estimator which happens to be the uniformly minimum variance unbiased one, in this case, is t_mid = (t_max+t_min)/2.

    What happens is that since all the t_i are sampled from a uniform distribution, the information that each of the 18 remaining events carry for the location of the box, once you have picked the minimum and the maximum, is null! They carry no information whatsoever, because each of these additional timing measurements, painfully obtained by Opera, equates to shooting at random a number within the bounds [t_min,t_max] ! Pure noise !

    In practical terms, the sample mean is NOT the minimum variance estimator: the sample mean <t> is affected by the noise that is instead discarded by the sample midrange t_mid. The sample midrange is the UMVU (uniformly minimum variance unbiased) estimator, not the sample mean! A few lines of code allow us to estimate the expected value of the uncertainties in the two estimators: we get that the expected statistical error on the sample mean of 20 values drawn from a box distribution of 50ns width is 3.3 ns, while the expected statistical error on the sample midrange is 1.66 ns.

    Below you can see the distribution of the two estimators <t> and t_mid for 100,000 repeated experiments where 20 timings are drawn at random from a box having range [-25,+25]. On the top the two are plotted in a semi-logarithmic graph, and on the bottom they are shown on a linear scale. You can clearly see that the black histogram (representing t_mid) is twice as narrow as the blue one (representing <t>). In case you ask, the former has a Laplace distribution (technically it is Nt which has a Laplace distribution, but that's a detail), while the latter is of course Gaussian.



    Update: As mentioned at the top, I am leaving the "interim" conclusions live below, but I am further commenting at the end, revising them significantly.

    In summary, not only did Opera quote a value, <t>=62.1 ns, which is not making the best use of their data; in so doing, they double the statistical uncertainty of their measurement! They quote <t>=62.1+-3.7 ns, while using the midrange they would have quoted instead t=65.2+-1.7 ns.

    If this sounds irrelevant to you, consider that the CERN beam ran for a couple of weeks (from October 22nd to November 6th) to deliver those 20 interactions to Opera. They could have ran for just a week instead, when Opera could still obtain a 3ns-ish statistical uncertainty, had they used the right estimator.

    I wonder if Ereditato will now be presented with a bill from CERN for the 10 extra days of useless running. It should amount to a few million euros I guess.

    Update: revised conclusions below.

    As you can read in the comments thread, a reader points out that if each timing measurement suffers from a random Gaussian smearing, the conclusions can change on whether the UMVU estimator is still the sample midrange or rather the sample mean. Indeed, going back to my toy Monte Carlo, I find that if, in a 20-event experiment, each measurement is subjected to a Gaussian smearing of size larger than 6ns then the best estimator returns to be the sample mean. The reason is that the distribution of  the data becomes a convolution of Gaussian and Uniform, and the events "within" the spanned range [t_min,t_max] return to contain useful information for the location of the distribution.

    On the right you can see the rms of the two estimators (in ns) as a function of the sigma of the Gaussian smearing (also in ns). The sample midrange is the UMVU for a box, but it is not a robust estimator!

    I am unable to determine whether Opera's timing measurements are indeed subjected to that effect; from a reading of their paper I am inclined to believe they are. Therefore, they may in the end have done the right thing. Nevertheless, I believe the article above retains the didactic value it was meant to have. Maybe I should just point out that Gaussians are indeed ubiquitous!

    Comments

    "If this sounds irrelevant to you, consider that the CERN beam ran for a couple of weeks (from October 22nd to November 6th) to deliver those 20 interactions to Opera. They could have ran for just a week instead, when Opera could still obtain a 3ns-ish statistical uncertainty, had they used the right estimator.

    I wonder if Ereditato will now be presented with a bill from CERN for the 10 extra days of useless running. It should amount to a few million euros I guess."

    Unless you know which points come in when, you don't know the value of a two week run v. a four day run. Small numbers are fluky, and it is entirely possible that the data points with the most statistical influence could have come in the last two days and it is also possible that there were several days that produced no data points at all. You can't tell a priori how much data you are going to get when you schedule the run, and even if you could, it doesn't really matter that much because whether you are at 1.7 nm or 3.7 nm, statistical error has always been the less significant component of the error relative to systemic errror in this experiment.

    dorigo
    Oh well ohwill, let me tell you: my argument was clearly a hyperbole. But if you argue about it in detail, I'm willing to play the game.

    So imagine we want at least 7 events (would need to run a few toys to verify they bring an expected variance equal to that of the Opera Gaussian case with 20 events, but it is a good bid). We know the rate (been measured in the last three years, know the number of neutrino interactions per protons on target with better than 1% accuracy, so will ignore that). If we want to be 95% sure that we get at least 7 events, we just have to integrate a Poisson of mean x such that it has integral from 0 to 6 less than 0.05. This yields a mean x>=11.9. So they could have run for 60% of the 16 days they did (here I am just assuming that 20 is the expected value of the number of interactions they get in that much running time, in the absence of a better estimate), and they would have been >95% sure to get at least 7 events.

    So you see, 0.4*16=6.4 days of spared beam. Close to a week, as I said eyeballing, without the need of a more precise calculation such as the one above.

    Cheers,
    T.



     
    rholley
    What does this mean for the «superluminal» claim?
    Robert H. Olley / Quondam Physics Department / University of Reading / England
    very little, I would imagine ;-)

    if anything, this tightens their result a bit...

    dorigo
    Indeed, quite little. The 20 events are overkill for what they are meant to demonstrate -i.e. that the bunch shape systematics are not the main concern. Of course all the systematics due to other timing sources remain in the small-stat measurement as they were in the 16000 event measurement.

    My bet on the fact that either a one-click offset in the 20MHz clock or the 8km long light fiber are the cause of the systematics is still there.

    Cheers,
    T.
    I was wondering, do you think the one click offset is due to the issue raised in:
    http://golem.ph.utexas.edu/~distler/blog/archives/002461.html#more ?

    Hi T,
    Lets say there is a Gaussian effect in the timing measurement with sigma > 50ns. In that case, with enough neutrino events we would expect events falling in numerous, adjacent, 50ns "boxes". The PDF within each box would of course be uniform, but the overall PDF would approach a Gaussian. Surely in this case the OPERA method is correct?

    First time commenter but long time reader and enjoyer of this wonderful blog!



    dorigo
    Hello Se',

    You may have waited too long to comment here, since your first comment makes a very, very good point. Indeed, upon considering it for a minute, it caused me to go back to my toy program and check the result of a smearing, and this will ultimately cause me to modify my conclusions - a partial retractatio, if you will. Let me explain in detail.

    What you are describing is not a systematic shift (a bias in the measurement due, for instance, to an uncorrected-for offset of some kind, like an incorrect determination of the light fiber transit time), but rather a smearing of each timing measurement. Of course such smearing does exist, and the issue is whether it is large enough to modify the pdf significantly enough to change the UMVU from sample midrange back to sample mean.

    Contrarily to what you describe (events jumping from one box to an adjacent one), the effect occurs the other way round: a timing measurement spread around a true value t* with a Gaussian distribution gets then arbitrarily moved by a random amount in the range [-25,+25]ns. That is what the jitter does, in fact, for asynchronous events in Opera.

    According to my toy Monte Carlo, the break-even point between midrange and mean for 20 events is when the measurements are spread around t* with a sigma of 6ns. For larger spreads, the Gaussian regime takes over, i.e. the distribution of the 20 events within [t_min,t_max] starts to contain more relevant information than noise, and one should indeed use the sample mean.
    [Note that this conclusion also depends on the number of events, albeit only weakly.]

    Now, if we look at the paper, we see that the full width at half maximum of the time profile of the bunches in the CNGS beam is 3ns, so the "sigma" they contribute is of the order of 1.5 ns; but on page 27 we read that there are additional smearings; I am unable to determine exactly how to interpret what I read there, but it might well be that they cause the Gaussian regime to take over.

    I am therefore modifying the text above, for the sake of accuracy and to be honest with the researchers in Opera. I was wrongly led to believe, by conversations with members of Opera and other informed colleagues, that the Opera researchers had been a bit naive in using the sample mean; now I suspect that at least a few of them knew better.

    Cheers,
    T.
    Quote: "...and the jitter of ± 25 ns related to the tagging of the external GPS signal by the OPERA master clock."

    Is this the 50 ns "box distribution" you're talking about?

    dorigo
    Yes. At least, that is what I understood from a seminar on the result. I am accustomed to jitters causing funnier distributions than uniform smearings, but not in this case.

    Cheers,
    T.
    But this jitter was not there in the original OPERA experiment. Why did it come about here?

    Oh, sorry, the jitter applied in the original experiment as well.

    "The OPERA master clock is disciplined by a high-stability oscillator Vectron OC-050 with an Allan deviation of 2×10-12 at an observation time of 1 s. This oscillator keeps the local time in between two external synchronisations given by the PPmS signals coming from the external GPS. This signal is tagged with respect to the uncorrelated internal frequency of the 20 MHz OPERA master clock, thus producing a ± 25 ns time jitter."

    I don't see this jitter being accounted for in the error estimates in the original experiment. Well, I'm missing it, I hope it is there.

    I don't want to go into details but I would bet this can't be the problem. Most of these hypothetical errors resulting from non-Gaussianities of various distributions etc. do cancel, at least at the leading order, and the subleading order is already too tiny to matter.

    dorigo
    Hi Lubos,

    if you refer to "the problem" being the 62ns offset from zero, of course, there's no question. We are only discussing minor differences between estimators and their uncertainties here.
    However, while what you say is in general true and I agree with it, in the article above I am making very quantitative statements, concerning the case at hand: 20 events, box vs Gaussian, estimator values and uncertainties. These kinds of statements are always to be preferred to abstract reasoning IMO... if only for their didactic value, which is where I put the accent here, as is clear from what I explain in the incipit.

    Cheers,
    T.
    Is it therefore appropriate to fit with the convolution of a (well motivated, specific) box shape and a (general purpose or default) Gaussian?
    Thereby the data between the two most extreme points would clearly "carry some information (or weight)" for determining the fit parameters, too.

    p.s.
    "<b> boldd text </b>" is renderered as "bold text",
    "<i> italicized text </i>" is renderered as "italicized text",
    "text<sub> subscript </sub>" is renderered as "textsubscript text" ...

    dorigo
    Hi Frank,

    if one knows well one's pdf, the minimum variance estimators are found by maximizing the  resulting likelihood function. Otherwise, adding the Gaussian smear as a free parameter to it is probably overkill in this specific case.

    As for the html tags: the strokes in the text above are intentional... It is text that appeared in the original version of the piece but which I needed to remove in a transparent way.

    Cheers,
    T.
    Fred Phillips
    the result is almost never dramatic...
    It seems pretty well established that the fall of the hedge fund Long Term Capital Management - as well as the larger financial crash that caused our current recession - stemmed from risk managers assuming a Gaussian distribution of returns on financial derivatives, when in actuality it was a power law. Power laws having longer, "fatter" tails.

    That's drama! Of the tragedic sort.
    dorigo
    Well, you are certainly right Fred. In real life there's much at stake, and that's why risk analysis in decision theory is so important. in HEP, we tend to stop at getting our posterior distributions. Even if we underestimate a tail of a systematic that's not "dramatic", in the sense that at most we overestimate the significance of an observed result... Nothing to do with getting broke.

    Cheers,
    T.
    I have always suspected that GPS is the problem.
    It is instructive to read the Wikipedia page on GPS, to get an idea of what kinds of accuracy are intrinsic to the system, and what kinds of error correction techniques must be applied to get the full advertised accuracy for military applications. Of course, I am assuming OPERA has full military authorization for their GPS decoding.
    There are clock synchronization issues with the satellites that are on the order of several nano-seconds, as variable ionosphere propagation delays and satellite "ephemeris" (tracking changes in the orbit) that must be corrected for. These are routinely corrected for in practice, but it is not a trivial problem, and there is ample opportunity for errors.

    From a paper called "Sources of Errors in GPS":
    Ionospheric effects ± 5 meters
    Shifts in the satellite orbits ± 2.5 meter
    Clock errors of the satellites' clocks ± 2 meter
    Multipath effect ± 1 meter
    Tropospheric effects ± 0.5 meter
    Calculation- und rounding errors ± 1 meter

    For a total error of approximately 15m!

    There is no reason that the OEPRA staff would not have addressed all of these and more. However it is interesting.

    dorigo
    DDT, what on earth brings you to add linearly errors from uncorrelated sources ?
    The total error from what you list is about 6.2 nanoseconds, not 15.
    Cheers,
    T.
    Of course, that's a ridiculous thing to do. Most of these are random, so doing many measurements over time would yield a very good average. Presumably this has been done.
    I'm just looking for anything that might account for the 60 nsec.
    My intent here is just to raise the GPS issue and show that admitted and known errors as of 2009 were not negligible, and that subtle mistakes could create an error big enough to invalidate the OPERA results.
    Also they have to keep the two GPS receivers synchronized 730 km apart, which adds to these errors. In addition they probably used multiple (4 minimum) satellites, which may have increased or decreased the final error. BTW: The use of a 20MHz clock anywhere in this data stream is a little scary: 20 MHz = 50 nsec!!!

    It seems to me that Satish Ramakrishna has already solved the 60ns mystery.

    http://arxiv.org/ftp/arxiv/papers/1111/1111.1922.pdf

    Hi Tommaso,

    I followed Zaybu's arxiv link to Satish Ramakrishnado paper
    Common-view mode synchronization as a source of error in measurement of time of flight of neutrinos at the CERN-LNGS experiment
    Do you know how Opera used the GPS satelites to synchronize their clocks? Is Ramakrishnado's explaination correct?
    Cheers,
    Martin

    I just read Satish's paper. He seems to have found the error in GPS I had sensed was there, a feeling I shared with several theoretical physicists the day of the original announcement. I think we have the result we all knew was inevitable.
    I just love the irony of relativistic effects unseating the challenge to relativity. But with all the physicists at LNGS and CERN, this is a little embarrassing.

    DDT, did CERN not refute the paper Time-of-flight between a Source and a Detector observed from a Satellite by Ronald A.J. van Elburg, which was essentially saying the same thing as Ramakrishna? AFAIK, the orbital speeds of the satellites are taken care of in GPS system itself and plays no role in the timing measurements.

    Tommaso,

    There are 20 independent measurements. The chances of getting all twenty to be superluminal by random chance is almost exactly one-in-a-million (0.5 raised to 20). That is the statistical probability what OPERA saw are faster-than-light neutrinos. You could quibble about whether it is 60 ns or not, but the point is errors in measuring standard deviations or issues with statistics can no longer explain the FTL part. The only possibility is a systematic error biasing every observation. The true issue is whether the measurements are greater than 'c', and that probability drops exponentially with every new superluminal measurement. By the time you get to 20 measurements, it is time to check things other than standard deviations of the measured errors.

    dorigo
    Dear AK,

    you missed entirely the point of the article above, which has nothing to do with the statistical significance of the Opera result (which is absolutely NOT computed the way you imply in the first two lines above -in fact Opera finds a 6.1 sigma with the other measurement, and does not bother to evaluate it for the 20-event set). You seem to have just made (1/2)^20 which is a quite funny procedure to determine a significance for that set.

    As for the fact that the problem is systematic and not statistical, I of course agree - but you could have realized that by checking the several other articles I published on this topic, where I point out that the issue is most likely the measurement of a time delay in a light fiber, or GPS systematics.

    Cheers,
    T.
    Regarding the GPS systematic, the Satish's paper mentioned in another comment should be examined carefully, there are not negligible chances that he has spotted the problem

    I am not saying OPERA computed their statistics this way. This is just another valid way to look at the issue (comparing against the null hypothesis is quite common in the social sciences, btw). I am saying what needs to be verified at this point are only what can produce systematic errors, that is errors biasing every single measurement the same way. Yes, that is what you used to say, but in this article, I don't see you mention that. The stats part is now really irrelevant. If OPERA has measured their delays and place coordinates right, even if the error margin of those measurements are higher than what they state, their result will stand. If they have a systematic bias in either of those measurements, the result will be wrong. What happened with the fiber-delay re-measurement? I know they were planning to use the RPC detector to repeat the experiment. Would they use the same fiber and FPGA for that case or a different one? As for the GPS stuff, it seems to me we are stuck - even a MINOS replication would not help. I haven't seen any reports of how to sync clocks across such a long distance without using GPS.

    dorigo
    You are insisting on a point that is of no interest to me in this post. I am interested in a statistics issue, as the title of the post says. Comment somewhere else if you want to discuss systematical issues in the measurement ! As I already said there are at least three other posts where I discuss that.
    I do not care that the statistical issue is irrelevant to whether neutrinos are superluminal here; rather, your comment is irrelevant to the matter.
    Cheers,
    T.
    Sorry, typo there: "That is the statistical probability what OPERA saw are *not* FTL neutrinos." In technical terms, that is the probability of the null hypothesis at this point.

    OPERA Experiment and its flawed GPS Measurements cause neutrinos to appear superluminal. Relativistic Time is where flaw is, since GPS adjusts time according to relativity’s equations. However, being confused about time has always been a tradition amongst physicists; we show below how physicists never even understood the nature of time in classical physics!
    What fantastic claim be shown by doing an OPERA-like experiment to measure speed of light instead of neutrinos? It can be shown that not only do neutrinos travel faster than light, but light travels faster than itself too! Of course light cannot travel faster than itself (and you don’t have to be an Einstein to realize that)!
    Speed = Distance / Time. If measured this way the neutrinos will not be faster than light, provided Distance and Time can be correctly measured. But OPERA used the latest technology – GPS, which adjusts time according to relativity’s equations. So in OPERA’s physics, Speed = (GPS Measured Distance) / (GPS Measured Time). You can get neutrinos back under light speed if you can eliminate GPS from the measurements. Relativity-worshipping physicists can (and probably will) escape the faster-than-light neutrinos dilemma by dodging GPS and thus dodging relativistic time equations. So relativity-worshippers may well have the last laugh. But look at the irony – they have to dump relativity’s time adjustments to fix the neutrino issue and save relativity’s light postulate!
    We have been saying since 2005 in our paper at physicsnext.org that, while different observes will measure different times in many cases, Einstein time-dilation equations are wrong, and we give the correct equations. Our paper at physicsnext.org has a COUNTER-EXAMPLE exists that PROVES that (though Einstein’s postulates are correct), Einstein’s claim of having derived the Lorentz transformations is wrong; yes a COUNTER-EXAMPLE — and at least one Nobel prize winner takes this realization seriously.
    Forget Einstein’s equations about time dilation, even Einstein’s claim about time itself being an independent physical quantity (his equations require this) is wrong. Our paper at physicsnext.org also shows how to build a clock that will not undergo any time dilation at all.
    Physicists are wrong even about what "classical physics" (i.e.pre-Einstein physics) said about time. Physics books widely state that in classical physics time was "absolute " by which they mean that is was an independent quantity that "flows" at a constant pace. No equation of classical physics implies any such thing about time, and if physics is equations then these writers failed to even understand Newton’s laws and equations, and what these equations imply about time. (They went for some Newton quotation from here and there, but those quotes are not physics, they are just secondary opinion with no physics to back it up. You can either go after the superfluous OR you can try to understand physics and what it implies; physics writers have unanimously chosen the former when it comes to Newtonian Physics and Time). Einstein’s physics and Einstein’s equations were the first time that physics was required incorporate the claim that time "flows" as an actual independent physical quantity; this claim about time we consider to be highly questionable philosophically, and this was a major motivation in our finding alternative equations that are consistent with Einstein’s two postulates.
    Below paragraph, taken unedited from p.7 of our paper at physicsnext.org, gives the actual situation about classical and relativistic time.
    While doing away with the concept of "absolute time," relativity presented a new thesis of "relative time flow" between inertial frames. We do not take the absolute time of Newtonian physics to have meant that time itself "flows" as an independent physical quantity -- it only meant that the equations worked in such a way that all observers measured the same time for the same event. We could attempt to make a similar statement about observers in different frames and relativity's relative time -- however, in relativity time is an independent physical quantity and we have actual time dilation.