I am triggered by the recent eported observation of a new particle, which has been claimed at a significance corresponding to the coveted 5 standard deviations after a previous evidence had been extracted from 40% less data at 3.8 standard deviations. The matter has left me slightly dubious about the precision of the latter claim.
Now, before I state the problem, let me explain in short how significance is calculated in these kinds of new particle searches.
A typical way to extract the significance of a signal is to compare the likelihood of two fits performed on the data: one (background-only hypothesis) assuming that the signal is not there, L_B; and the other (signal-plus-background hypothesis) calculated by adding to the background model the hypothetical signal, L_S. The logarithm of the ratio between these numbers can be used to compute a p-value, and from it a corresponding significance of the added signal. There is nothing mysterious about the above methodology, which is based on Wilk's theorem and is well-known basic statistics. References can be provided on demand.
Now, the issue is the following. Imagine you see a 2- or 3-sigma signal in your data, and that the signal is indeed genuine (although you do not know it yet): in such case, if you collect more data the significance of the signal is bound to increase; typically, if you consider four time as much data as the former set the significance should double. In principle, therefore, there is a simple recipe to decide when to stop collecting data and re-analyzing it in search of a 5-standard-deviations signal: this could be set as "take the initial dataset, and perform a toy-Monte-Carlo study to predict the amount of data which will grant, 90% of the time, an observation-level significance".
Of course one could be less restrictive, and make 50% the above quoted 90%. This is beside the point. The point is that one should decide beforehand when to remake the analysis. If one omits doing so, and just adds data in small bits to the original sample, checking how much the significance increases until he or she reaches the 5-sigma level, this is a biased procedure.
The bias, which I dub "Keep-Looking Bias", results from what is known in statistics literature as "sampling to a foregone conclusion". In other words, the analyzer assumes that the signal is there, and impatiently continues to check that hypothesis until he has a strong enough case to publish a new observation.
I am presently working with pseudoexperiments at a quantification of the bias. The problem is academic in nature, since at the basis we have assumed that a signal is indeed in the data; however, please note that there is no agreement in experimental High-Energy Physics on avoiding the bias. Some experiments do just that: continue to sample, and as soon as they hit the 5-sigma line, they claim discovery. The bias is due to the fact that fluctuations in the data make the significance bounce up and down as it increases with the increase of the dataset, and one should not be looking at the highest points that this fluctuating quantity takes, but rather refer to its average (or rather, median) value.
In the following days I will post here some results. For now, I would just like to ask whether any of you, dear readers, have considered this problem in the past. Please use the comments thread for your ideas and comments!