The Keep-Looking Bias

Despite time is a scarce resource for me these days, and my "working time balance" shows deep red, I am presently spending some of it to investigate a very interesting statistical effect of general nature, although specially connected to the issue of discovery thresholds in particle physics.

I am triggered by the recent eported observation of a new particle, which has been claimed at a significance corresponding to the coveted 5 standard deviations after a previous evidence had been extracted from 40% less data at 3.8 standard deviations. The matter has left me slightly dubious about the precision of the latter claim.

Now, before I state the problem, let me explain in short how significance is calculated in these kinds of new particle searches.

A typical way to extract the significance of a signal is to compare the likelihood of two fits performed on the data: one (background-only hypothesis) assuming that the signal is not there, L_B; and the other (signal-plus-background hypothesis) calculated by adding to the background model the hypothetical signal, L_S. The logarithm of the ratio between these numbers can be used to compute a p-value, and from it a corresponding significance of the added signal. There is nothing mysterious about the above methodology, which is based on Wilk's theorem and is well-known basic statistics. References can be provided on demand.

Now, the issue is the following. Imagine you see a 2- or 3-sigma signal in your data, and that the signal is indeed genuine (although you do not know it yet): in such case, if you collect more data the significance of the signal is bound to increase; typically, if you consider four time as much data as the former set the significance should double. In principle, therefore, there is a simple recipe to decide when to stop collecting data and re-analyzing it in search of a 5-standard-deviations signal: this could be set as "take the initial dataset, and perform a toy-Monte-Carlo study to predict the amount of data which will grant, 90% of the time, an observation-level significance".

Of course one could be less restrictive, and make 50% the above quoted 90%. This is beside the point. The point is that one should decide beforehand when to remake the analysis. If one omits doing so, and just adds data in small bits to the original sample, checking how much the significance increases until he or she reaches the 5-sigma level, this is a biased procedure.

The bias, which I dub "Keep-Looking Bias", results from what is known in statistics literature as "sampling to a foregone conclusion". In other words, the analyzer assumes that the signal is there, and impatiently continues to check that hypothesis until he has a strong enough case to publish a new observation.

I am presently working with pseudoexperiments at a quantification of the bias. The problem is academic in nature, since at the basis we have assumed that a signal is indeed in the data; however, please note that there is no agreement in experimental High-Energy Physics on avoiding the bias. Some experiments do just that: continue to sample, and as soon as they hit the 5-sigma line, they claim discovery. The bias is due to the fact that fluctuations in the data make the significance bounce up and down as it increases with the increase of the dataset, and one should not be looking at the highest points that this fluctuating quantity takes, but rather refer to its average (or rather, median) value.

In the following days I will post here some results. For now, I would just like to ask whether any of you, dear readers, have considered this problem in the past. Please use the comments thread for your ideas and comments!

IrishNeanderthal
From TCW Defending Freedom, formerly The Conservative Woman: 31st May: Never forget - it's...

MAHA Report Is A Bridge Too Far Against Farming | Science 2.0 · 2 days ago
Berkshire_Bee
The usual way to drive into the main Reading University campus is lined with young oak trees, the trunks of which are covered with lichens like these:...

The 'Still Explosions' Of Lichens On Stone · 1 week ago
Anonymous Snowboarder Needs Sn
That sounds like an awesome experience. Besides time for observations, how many days are appropriate to visit and not be rushed?

The Night Sky From Atacama · 1 week ago
Clay Baggins
Simple answer...No. You should ask, "Does the Perception of Human Caused Global Warming Prompt an Excuse to Go To War." There is no global warming being caused by Co2 generated by humans....

Does Global Warming Cause War? · 1 week ago
David Brown
"... countless galaxies of all shapes and brightness ..." Why does Milgrom's MOND seem to correctly model galactic rotation curves? Is MOND essential for understanding the structure...

The Night Sky From Atacama · 3 weeks ago

Related articles

Comments

Know Science And Want To Write?

Donate or Buy SWAG