In 2011, science has been confronted with several high profile awkward situations of having to explain why standard methods like classical significance analysis are acceptable in for example medical studies on the safety of a new vaccine but not when results put orthodoxy into doubt. The most infamous among them is the 6 sigma significance of the OPERA confirmation of previously by MINOS indicated faster than light neutrinos. Second place: evidence for precognition in a work [1] that abides by all the usual scientific methods and passed peer review in a top tier journal. Maybe strong hints for quantum brain processes [2] (discussed here) come in third. The latter is another of many phenomena that skeptics out to defend scientism argue to be impossible [3].

Such novel results are of course doubted as they should be, with a healthy dose of scientific skepticism. However, they are also bitterly argued against. Scientists who seriously consider possible consequences of the novel results are portrayed as crackpots and made to look ridiculous. The latter is perpetrated also by scientists who claim to defend the scientific method, and the sad thing is, it is the majority of scientists that are visible to the public who engage in such unscientific bullying and silencing. The public does not understand the fine details and the perception that arises is that yet again orthodoxy and bias trumps the scientific method and public trust into all things science vanishes accordingly. Knee jerk skepticism backfires.

This article will introduce on a lay level how biased skeptics actually argue in somewhat more detail, where they employ pseudoscience in the name of their “war against pseudoscience”. This includes a hopefully memorable (mnemotic) introduction of Bayesian Probability that may be worth reading by itself and derives Cromwell's Rule, which is basically the rule that skeptics exploit.

Extraordinary Claims require Extraordinary Authority

A fashionable reply is Marcello Truzzi’s "Extraordinary claims require extraordinary proof". Carl Sagan popularized this as "Extraordinary claims require extraordinary evidence", which may though rather be based on Laplace’s "The weight of evidence for an extraordinary claim must be proportioned to its strangeness." Anyway, "Extraordinary claims require extraordinary evidence" sounds nice, which is the most important aspect of it, because this motto serves mainly as a truncheon in the skeptic’s tool bag. It hides the Argument from Authority by one more step: Who decides what counts as extraordinary rather than expected?

In order to have the argument from authority appear like a proper scientific argument, they employ Bayesian Probability, which is indeed the most advanced way of dealing with uncertainty. We will see below how it is corrupted but first discuss that this is done increasingly often; I hope such may help the readers to spot occurrences of such false arguments. In short: Any undesired scientific significance can be diminished by mixing undesired data with a so called ‘prior’ probability, which is often merely a held belief on false support.

Bayesian Probability: A verdict in a criminal trial should not be based on the probability of guilt, but rather the probability of the evidence, given that the defendant is innocent.

Bayesian Updating whenever something is undesired? They wouldn’t, would they?

You may have heard "Extraordinary claims require extraordinary evidence", but people employing Bayesian probability to hide an argument from authority – is that actually done? I personally just had a vague suspicion nourished by online comments about that one should use Bayesian Inference and that such would argue against for example faster than light neutrinos. Yet I did not think much of it. Until I saw this: Wagenmakers, et al.: “Why psychologists must change the way they analyze their data: The case of psi.” [4].

Effectively, undesired evidence is to be multiplied with a prior equal to zero, namely the dogma insisting on that faster than light or influence from the future or the earth orbiting the sun are impossible, period. Those who did not get caught up in orthodoxy know that relativity looks very much like a merely emergent, low-energy phenomenon [5], thus faster than light phenomena are expected at extremely high energies. It is simply not true [6] that faster than light particles necessarily travel back in time and kill your grandpa (they don’t). On closer inspection, an unpopular phenomenon often becomes expected and perhaps fundamentally happens all the time (read: is ordinary) rather than extraordinary. That the world is not flat is “extraordinary” simply because of ordinary orthodoxy.

People are increasingly aware of the history of science, so just calling something “extraordinary” is of course not sufficient. Scientific significance of empirical evidence is based on statistical measures. Statistics allows for any desired level of sophistication.

The Bayesian Method

Many still fight over what probability is at all. In the Classical interpretation, probability is the ratio of favorable outcomes to possible outcomes. This is circular, because it assumes that the different possibilities have some probability assigned already, say by symmetry arguments: the two sides of a coin are equivalent as far as the falling and landing are concerned when tossing one, leaving heads and tails equiprobable. In the Frequentist approach, probability is strictly counting the occurrences of the different outcomes over many trials. If you did not practically count yet, you assume the system under investigation, say a coin, to be similar to one you experimented with before.

The Bayesian approach (pronounced BAYZ-ee-un) is a mixture of both. Classical probability enters in the beginning and is often expressed in terms of subjective degree of belief in some proposition. Empirical counting then feeds into the “Bayesian updating”. How does this work?

Well, if you want a PhD in physics, you need to work with P, H and D, namely probabilities P, Hypotheses H, and Data D. The joint probability of both, namely hypothesis and data to be true at the same time, we write PH&D. It is obviously the probability of the hypothesis given the data, we write PHgD, multiplied with the probability of the data, which is expressed via PD:


PH&D is the same as PD&H, therefore we may as well write PDgH PH on the right hand side. The interpretation is now different: PDgH is the likelihood of the data given the truth of (assuming) the hypothesis. This is multiplied with the probability of the hypothesis. Now you know the difference between probability and likelihood.

The advantage of starting like this is that the thus easily remembered line


has two not as easily remembered formulas inside of it. The left equation gives you the conditional probability of H given D (whatever H and D stand for), namely PHgD = PH&D / PD. The right equation gives you Bayes' theorem:


This one has an interesting interpretation. The first term is also just the conditional probability for H given D. However, it is now called the posterior probability, because it is the new probability that your hypothesis has taken on after taking into account the data! PH, the probability of the hypothesis before new data are taken into account, is called the prior probability or short prior, because it came first. “Posterior = Likelihood * Prior / Data

Bayesian updating goes as follows: As you accumulate more data, you update your previous prior probability with this formula to get the improved probability, called posterior probability. The next time around, you use this posterior as the new prior. This is the most consistent way to calculate probabilities, because there are for example no tricks like Dutch books possible with this method. In case a true probability exists (say I rigged a game and let you play), you will via accumulation of ever more data eventually see the posterior probability approach the true probability, even if starting from a very wrong prior assumption. This is excellent science, because contrary to the classical approach, starting with a wrong concept will not chain you to that wrong concept. The truth is out there; you find it in the data.

Let me give a simple example and then show how to corrupt it.

A Simple Black and White Example:

There are two kinds of urns with ten balls each. The first type of urn has one black ball and nine white ones. The second type of urn has four black ones and six white ones. You are given one urn and draw a ball at random. It turns out to be black. What is the probability that you where given the first type of urn?

Call the hypothesis that you were given the first or second type of urn “H1” and “H2”, respectively. Since you do not know the chances with which you were given the first rather than the second type of urn, you naturally assume that the prior probabilities of these hypotheses, PH1 and PH2, are equal, namely both are 50% (or PH1 = PH2 = 1/2). The data, here your black (B) ball, will now be used to improve on this prior assumption.

The posterior probability given the data D= B and asking for hypothesis H1 is given by Bayes' theorem stated above, namely PHgD = PDgH PH / PD becomes PH1gB = PBgH1 PH1 / PB.

The likelihood PBgH1 is 1/10, because the first urn has one black ball out of ten in total. The probability of the data PD is gotten via adding all possible ways in which they can arise while however weighting these ways by their probabilities. In general this is PD = PDgH1 PH1 + PDgH2 PH2.

For the data being just the black ball D = B holds therefore

PB = PBgH1 PH1+ PBgH2 PH2 = 1/10 * 1/2 + 4/10 * 1/2 = 5/20

And so we are almost finished. The result is: PH1gB = PBgH1 PH1 / PB = 1/10 *1/2 / (5/20) = 1/5.

Only one out of five! Naturally, we expected this, because the first type of urn contains so many white balls, the hypothesis that it was the first type of urn is not supported by the data of having drawn a black ball. You can do the same calculation for the second hypothesis H2:

PH2gD = PDgH2 PH2/ PD becomes PH2gB = PBgH2 PH2 / PB = 4/10 * 1/2 / (5/20) = 4/5.

In the beginning, you knew nothing much, so your priors were unbiased (= 1/2), but now your data has given you some insight about what type of urn you were given: H1 has posterior 1/5 while H2 has 4/5; the total is 100%.

Enter Faster Than Light Neutrinos

The above is routine in science. For example, the balls could be faster than light (FTL) neutrino data. Mathematical methods apply generally. We are living in some universe, probably a multiverse. The particular physics we “are given” we do not know, especially not the physics applicable to our problem at hand; that is why we do the experiments in the first place. Hypothesis H1 assumes that FTL neutrinos do not exist, so they pop up only seldom in the data, but sometimes they do, say because of statistical flukes in measurement devices or systematic errors. H2 is the hypothesis that FTL particles should show up at high energies since such is possible in emergent relativity and relativity looks very much like an emergent symmetry. It is expected from several cutting edge models which are vital to unify physics and there are even hints from previous experiments.

Ha – experiments! Great – we have a good prior, namely the posterior from the previous experiments. Sadly, those experiments did not only have low significance, but it is hard to put these previous experiments into actual numbers. In the case of MINOS neutrino data, they may indicate either low superluminal velocities or, if taken together with supernova data and OPERA data, one better interprets them as very high initial superluminal phenomena over short distances. If it is not even agreed whether the velocity is below that of light, a little above, or very much above that of the velocity of light, how can we set the probability PH1 to any particular number? Since the previous data are highly controversial, the prior must be unbiased as before (as in unbiased science).

The results stay the same: H1, the hypothesis that has very few opportunities for FTL particles to turn up in the data, turns out unlikely. H2 allows FTL neutrinos and is supported by the FTL data. Serious scientists take them yet more seriously and perhaps one can start to use these new, highly significant posteriors as the priors next time.

Enter Scientists' fixed Beliefs

A sober scientific analysis is sadly not what usually happens in controversial cases where a sober mind would be most necessary. Many plainly refuse the hypothesis H2. They dogmatically believe in H1! They argue that FTL is impossible on principle, drawing on whatever convenient arguments. Indeed, some do go on and call everybody a crackpot and Einstein denier who does not agree with that PH2 = 0, period. In this case, Bayesian updating does not care about any new data because the prior belief is so strong that whatever your data are, the posterior stays to be the prior belief. This is called Cromwell's Rule:

“The reference is to Oliver Cromwell, who famously wrote to the synod of the Church of Scotland on August 5, 1650 saying

    “I beseech you, in the bowels of Christ, think it possible that you may be mistaken.”

As Lindley puts it, if a coherent Bayesian attaches a prior probability of zero to the hypothesis that the Moon is made of green cheese, then even whole armies of astronauts coming back bearing green cheese cannot convince him. Setting the prior probability (what is known about a variable in the absence of some evidence) to 0 (or 1), then, by Bayes' theorem, the posterior probability (probability of the variable, given the evidence) is forced to be 0 (or 1) as well.” Source: Cromwell's Rule

In our example, PH2 = 0 lets the following happen: PD = PDgH1 PH1 + PDgH2 PH2 becomes PD = PDgH1 + 0 and thus equal to PDgH1. Therefore, the posteriors equal the priors:

PH1gB = PBgH1 PH1 / PB = PBgH1 1 / PBgH1 = 1


PH2gB = PBgH2 PH2 / PB = PBgH2 0 / PBgH1 = 0

Scientists  do not just put in a prior of zero. That would be too obvious. They argue PH2 so small that it for all practical purposes may as well be zero. They massage arguments so long, entering assumptions and dismissing previous studies, until they have a prior PH2 that is small enough so that PH2gB is much smaller than PH1gB.

The trick is to make PBgH2 PH2 much smaller than PBgH1 PH1 = PBgH1 (1 – PH2). One can for example contrive arguments until PBgH1 and PBgH2 are as desired, but even if those would be known, you can still simply argue until PH2 is much smaller than PBgH1 (1 – PH2) / PBgH2 in order to refuse letting the reality of the data change your world view. In our example above, claiming that PH2 must be assumed to be much smaller than 1/5 does the job.


If we allow belief to enter controversial priors, the scientific method will be rendered impotent. If scientists may do so, there is no reason why intelligent design (ID) should not insist on its own prior about that evolution is impossible. If you are a real skeptic who is interested in science outreach and reestablishing the trust of the public, you do not force feed your beliefs via a perversion of the scientific method. You agree to unbiased priors and let the experimental data speak.


[1] Daryl J. Bem: “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect” Journal of Personality and Social Psychology, 100,407-425 (2011) [NOTE: Quoting peer-reviewed papers does not imply commitment to the truth of their claims!]

[2] Erik M. Gauger, Elisabeth Rieper, John J. L. Morton, Simon C. Benjamin, and Vlatko Vedral: Phys.Rev. Lett. 106(4), 040503 (2011)

[3] M. Tegmark: “Importance of quantum decoherence in brain processes.” Phys Rev E 61(4), 4194 (2000)

[4] Wagenmakers, et al.: “Why psychologists must change the way they analyze their data: The case of psi.” Journal of Personality and Social Psychology, 100, 426-432 (2011)

[5] Vongehr, S.: “Supporting Abstract Relational Space-Time as fundamental without Doctrinism against Emergence.” arXiv:0912.3069v2 (2009)

[6] Liberati, S., Sonego, S., Visser, M.: “Faster-than-c Signals, Special Relativity, and Causality.” Annals of Physics 298, 167–185 (2002)


More from Sascha Vongehr sorted Topic for Topic