You, A Bayesian
    By Tommaso Dorigo | October 2nd 2009 02:54 AM | 33 comments | Print | E-mail | Track Comments
    About Tommaso

    I am an experimental particle physicist working with the CMS experiment at CERN. In my spare time I play chess, abuse the piano, and aim my dobson...

    View Tommaso's Profile
    Everyday use of a mathematical concept

    The concept of probability is not alien to even the least mathematically versed among us: even those who do not remember the basic math they had in primary schools use it currently in their daily reasoning. I find the liberal use of the word "probability" (and derivates) in common language interesting, for two reasons. One, because the word has in fact a very definite mathematical connotation. And two, because the word is often used to discuss the knowledge of a system's evolution in time without a clear notion of which, among either of two strikingly different sources, is the cause of our partial or total ignorance.

    1. As far as the mathematical flavour of the word is concerned, it is sufficient to note that its loose usage directly involves the eyeballing of a real number taking on values between 0 and 1. Whether you are talking about the chance that a colleague missed the train given that she is not showing up, or whether you am wondering who might win at Roland Garros next June, the moment you say "Susan is usually on time, so it is highly probable that she missed the train" or "Nadal has a high probability of winning next year, especially if Federer does not play" you are guessing that a real number is significantly larger than zero. You might even inconsciously be assigning it a definite value, if you are taking a decision: start the meeting anyways, or accept your friend's bet against Nadal.

    2. Few of us, in using the word "probability", spend a moment of their time pondering on the real meaning of the word, and the fact that what it usually does for us is to explicitate our imperfect knowledge of the state of a system, or, in many cases, its past or future evolution in time. Our ignorance may be due to two quite different sources, although we usually fail to realize the difference: the intrinsic randomness of the phenomenon we are considering (say, if we try to predict the future, like the participation list of next years' Roland Garros), or the insufficient data at our disposal on a system which is perfectly deterministic, id est one that has already taken a definite state, however unknown to us (Susan took the train or missed it).

    Note that for statisticians the above statements on Susan or Nadal are guesses, not estimates: estimates come exclusively from measurements! Yet similar sentences get as close as anything to scientific thinking for non-scientists in their everyday life. The reason why they do is that they implicitly use Bayesian reasoning. If you do not know what Bayesian reasoning is, read on: you might realize you are smarter than you thought.

    Two schools of thought

    There are two schools of thought when discussing the probability of different hypotheses. One school is called "Frequentist". If I were using frequentist thinking, I would say that observing the behavior of an infinite number of readers N of this post would allow me to compute the probability that you leave it before getting to the end, by taking the limit, for large N, of the fraction of departures divided by N.

    The other one is called "Bayesian" after Thomas Bayes, who formulated in the mid 1700s the famous theorem named after him. Bayes' theorem was published postumously in 1763 by a friend, who beat on time the mathematician Pierre Simon Laplace. For a Bayesian, probability is not a number defined as a fraction of outcomes, but rather a quantity defined irrespective of the possibility of actually counting a fraction of successes: one which has to do with the degree of belief of the experimenter. This belief sizes up the likelihood one may assign to the occurrence of a phenomenon, based on his or her prior knowledge of the system.

    Being a Bayesian -at least in everyday life-, instead of fantasizing about infinite readers I use my prior experience on similar articles I wrote, and the data provided in this blog's access statistics, to try and put together my guess: in statistical jargon, I assign a "subjective prior probability" to the dropping out of readers during a long piece, depending on its length and subject. Since the number in this case comes out pretty close to unity -a certain outcome-, let me get to the heart of this post quickly. I want to convince you that you, too, are a Bayesian. But I need to explain you the theorem first!

    Bayes' Theorem

    If you are here in search of a rigorous definition, you have come to the wrong place. I will give in the following a simplified, inaccurate explanation of this important tool in statistics, by considering an example. As I just had occasion to note, I am a Statistician only by election...

    It all revolves around the meaning of conditional probability. We may call C a restricting condition on the occurrence of a phenomenon A, and write the probability of occurrence of the condition C as P(C). Similarly, the probability of the phenomenon A may be labeled P(A), and the probability of occurrence of A under the restricting condition that C also occurs as P(A|C).

    In our Nadal/Federer example above, we might assign to C the meaning "Federer plays Roland Garros 2010"; A will mean "Nadal wins RG2010", and A|C is what we want to know, namely "Nadal wins RG2010 if Federer plays". Bayes' theorem then can be stated as follows:

    The probability P(A|C) that "Nadal wins RG2010 if Federer plays" can be computed as the product of three terms: P(A), the probability that Nadal wins; P(C|A), the likelihood that Federer plays if Nadal wins; and 1/P(C), the inverse of the probability that Federer plays. In symbols,

    P(A|C) = P(A) P(C|A) / P(C).

    Let us give P(A) a 30% chance: the overall chance that Nadal wins. This is our subjective prior, our degree of belief on Nadal winning. Then if P(C), the chance that Federer participates, is 90%, and the likelihood of Federer being there if Nadal wins is 60%, we get for P(A|C) the number of 0.3*0.6/0.9=0.2, or 20%. In this example, a good part of the chance of Nadal winning Roland Garros comes from Federer not coming. Note that somebody else might assign a different probability to P(A), and his result for P(A|C) will in general be different. This is at the heart of what makes Bayes theorem controversial among Statisticians.

    I think what is counter-intuitive in Bayes Theorem is the fact that to compute a conditional probability P(A|C) we need to consider the case where we switch the place of the conditioning clause and the main phenomenon which is the object of our attention, by evaluating P(C|A). But there is really nothing mysterious, if you consider a graphical interpretation of the theorem.

    To interpret the figure above, which I have stolen from the enlightening slides of a course held last summer by a colleague, you are asked to liken probabilities to areas on a plane. Our "universe of possibilities" is the big rectangular box; then, the occurrence of phenomena in this universe can be represented by smaller areas within the box. The other thing you need in order to decrypt the slide is the meaning of symbols of union of sets and intersection of sets: A "U" symbol between two clauses labels "union", and indicates the logical or of two clauses, while the symbol looking like a upside-down "U" labels "intersection", and indicates the logical and of the two clauses.

    I find the graphical demonstration above quite simple to grasp, and I hope you will concur. I have always wanted to put together a similar slide but I was happy to be saved the work... Thanks Bob!


    I think you will be wondering why on earth you should consider yourself a Bayesian, since the theorem above looks obscure to you and you never even dreamt of computing conditional probabilities or subjective priors. Well, I think you do, inconsciously. When you have to take a decision in your daily life you are most likely to rely on your past experience on similar situations, but in the toughest cases that experience will not be available, and what you will draw from it will be a "degree of belief", which you build by extrapolation or other kinds of logical reasoning. If you ask yourself what is the chance that Susan took the car to drive to work today, if she is not there by the time the meeting starts, you will need to consider the probability that Susan is late at work when she takes the car, and this may be known to you from past experience; but you also need the overall probability of Susan taking the car, and this might be only available to you as a personal degree of belief. You will most probably not end up multiplying P(took car) by P(late|took car) to get your feeling on the chance that she is arriving by car, but your intuition will be based on those concepts anyway.

    Then, there is another reason why you are probably using Bayesian reasoning in your everyday life, and that is its application to risk analysis. But I think I need to stop this article here: my risk analysis of dropping readers makes the price of writing another page not worth the while... It will be a wonderful topic for another post, if I manage to put together a few particle physics examples. Not your everyday bread and butter, I know, but at least more interesting than discussing chances of winning at the Casino. Or so I think.


    Nice argument. I have, however, two skeptical comments. First, in a sense it is trivial to say that all of us are 'Bayesians'; Bayes theorem is a theorem, so, everyone accepts it (if she is minimally rational); it is like discussing whether we are all 'al-Khwarizmians' or not, from the fact that we accepts the laws for solving algebraic equations.
    Second, in a sense, it is false! For we actually do not assing numerical probabilities (how do you know if your subjective probability of the Pope dying next year is 0.56 or 0.558987645637?), and each of us mantains lots of contradictory beliefs.

    Nice article, Tommaso.

    Well, as Jesus Christ has suggested, I also accept Bayes's theorem, and use it to argue in many situations. Still, I don't consider myself Bayesian - and most scientists don't. As a champion of the alternative (in fact, it is Bayes who is alternative) school of frequentism, I consider only the probabilities that can be measured, at least in principle, by a repetition of the same situation - i.e. measured with an accuracy that can be increased - to be physically/scientifically meaningful.

    Bayesian probabilities are always subjective because they depend on the immediate knowledge of a person as well as her subjective decisions what the priors should be and which evidence should be taken into account and how. And which evidence should be considered independent etc. So one can't expect scientists to objectively agree on any Bayesian probability, numerically. Bayesian inference is just a "good strategy", not a law of physics or an objective law of physics. See
    Hi Lubos,

    it is strange, although possibly true, that most scientists are not Bayesians. Because we often deal with experiments that further our knowledge only if we are capable of formulating a meaningful prior probability for unmeasurable clauses.  But I agree that the ground is moot.

    Well, right, Tommaso. Formulating sensible expectations - priors - is a part of the art of physics, at least I view it this way, and art itself is not yet physics (although it is needed for a physicist). That's a part of the "positivist" heritage - not to excessively speak about numbers that we can't define operationally. 
    But the art is needed for the actual work. In physics, we need to give reasonably high priors to all possibilities, even those that seem to violate religious preconceptions etc. We must be creative enough not to forget any possible explanation. And we must punish by low priors excessively "awkward" explanations. One needs a good taste to see what is awkward, and this taste is also built by experience.
    But as I said, I think that the Bayesian probabilities are uncertain not only because of the priors but also because of the selection of the evidence. Bayesian probabilities are designed to sound equally quantitative as the frequentist probabilities - N_i / N calculated from |psi|^2 in quantum mechanics etc. They formally work in the same way except that the exact numbers don't mean much.

    For example, if the IPCC says that the probability that most of 20th century global warming is due to CO2 emissions is 90%, what does it mean? It's just a statement about the feelings of some people. The right answer is, of course, either 0% or 100%. My answer is almost certainly lower than 50%. The right number between 0% and 100% will never be known accurately because the right number is actually 0% or 100%. ;-) 

    I am NOT (or so I guess) Jesus Christ; that was one that happened to have the same name as me (Jesús, which, by the way, is a very, very! common name here in Spain). Don't mind.
    My point was basically to ask why whether people has to accept (or do accept) Bayes theorem and its logical consquences is regarded as a more relevant philosophical or epistemological problem than whether people accept the theorems of algebra or arithmetics.

    Thanks for telling me you're not Christ. Otherwise I wouldn't know you were not the Son of God. ;-) Seriously, I know it is a frequent Spanish name.
    The problem with the Bayes theorem - that goes beyond the indisputable proof that Tommaso sketched in his pictures - is philosophical. The problem is that the areas within the rectangles are actually interpreted in various ways, and whether or not such interpretations are objectively meaningful is what people disagree about.
    Well, ok, I do not think we have to enter that mine field just because I explained the theorem here... I know that Bayes' theorem is one of the gateways to philosophical quesions, but here the question is simpler: do we, in our daily life, calculate probabilities as frequency limits, or as degrees of belief ? I think we do the latter. Then, as far as scientific measurements are concerned, one can be a Bayesian or a frequentist, and these are equally good positions. For instance, the frequentist will not have to assign a prior probability to "SUSY is the correct theory of nature". There are situations, however, when such a standpoint is really required.

    Tommaso, you have surely entered the philosophical mine field already if you claim things such as "probabilities are calculated [only] as degrees of belief in the everyday life".
    In the daily life, we surely use probabilities in both ways - so the answer is "Both". And I guess that frequentists like mine will say that we more often calculate probabilities as (approximated) limiting ratios of cases. In the "primitive" everyday science, if we want to say the probability that one catches an infection under some conditions, we estimate it by computing the event ratios that have already been seen in the past, relevant for the same situation.

    In science, the probabilities calculated in statistical physics as well as quantum mechanics (from squared amplitudes) are surely meant to be limiting ratios of event numbers and they don't require any "beliefs". That's how they're being verified, too. Quantum mechanics or statistical physics may sound excessively sophisticated compared to everyday life but the everyday life questions are really simplified versions of the same thing.

    I agree that we need the "degree of belief" kind of probabilities to say anything about the expectations that SUSY will be seen or any other expectation of a thing whose full model doesn't exist. But that's why it's not yet objective science - it's just a belief. My belief about this question is surely better than yours - because its priors etc. are more sensibly constructed out of the known experience in physics - but it's still belief and will remain so until the probabilities will be converted to ratios of event numbers of some kind.
    Sure, Lubos. I was referring to the kind of intuitive inference that we do when we reason that "dad will be home by now" or that "I'll need a umbrella if I leave to London next week".
    So was I, Tommaso....

    I mostly use frequentist techniques to settle such questions. If I go to London next week, I review the ensemble of similar visits in the past (either mine or all people's visits) with similar weather forecasts for the destination as we see now (and/or similar other things that may matter), and compute the ratio of these past visits in which I actually needed an umbrella. That's the estimated probability that I will need umbrella for the next week. But it's been measured as the (approximate) frequency probability, wasn't it?

    If this probability of needing an umbrella is smaller than a certain number - that is determined by the additional hassle by adding one umbrella to the trunk - then I take the umbrella. That's what I consider the "intuitive inference" and it is based on frequency probabilities. Of course, in reality, all these considerations are done approximately, intuitively, instinctively, but the instincts are inherently using the same logic.

    How do you determine whether you need the umbrella next week? We may be using different algorithms. But in this particular question, of rain, I haven't used the Bayesian formula - explicitly or heuristically/approximately/instinctively - too many times in my life. I have almost never dealt with "hypotheses predicting some conditional probabilities". Why would I be doing that if such things can be directly extracted from the experience, by computing ratios of events (frequency probabilities)?

    And if the estimated confidence level depends too much on beliefs that don't have these "ensembles of precedents" (admitting a frequency analysis) or otherwise plausible ways to quantitatively calculate things, I just don't trust them. Because of these reasons, science is primarily about a direct measurement - and statistics of cases - that doesn't require any hypothesis whatsoever. What the hypotheses are needed for is to properly define the otherwise problematic notion of "similar things" and to correlate the past experience with some "parameters" with predicted phenomena that depend on "other values of the parameters". 

    Because the relationships between things can be functional, we can relate many more types of events that were previously disconnected. We are seeing many more patterns all the time. That's where the whole construction of scientific theories and their evidence is inserted. But at the very end, it must boil down to frequency statistics applied to the experience in the past, otherwise it's not science (or it's not a rational approach to daily life).
    I would argue that scientists, almost without exceptions, are Bayesian in their thinking but Frequentists in their calculations. That is, they carry out a Frequentist analysis and then interpret the the results as if they had come from a Bayesian analysis. Sometimes this is harmless because both schools of thought happen lead to similar numerical resultsy.

    But sometimes this is dangerous. The most common example is p-values. Scientists interpret a p-value like a Bayesian probability of error. You got a 0.05 p-value, so there's a 95% chance that your conclusion is correct. Wrong! Maybe wrong by an order of magnitude. You can get a small p-value and still have a very high probability of being wrong.

    I completely agree with that. Claiming that some probabilities are "Bayesian" or "calculated from some formulae" can't save one from methodological, statistical errors (such as the hockey stick mining, to mention a recent example), or from hugely unrealistic priors (notice the people who say that the arrow of time shouldn't exist because the initial [sic] states should always be thermal), or from the error of overlooking some possible explanations, or from incorrectly assuming that two pieces of evidence are independent, or from a wishful thinking, or from a qualitatively wrong theory or wrong parameterization of the problem.
    Experimentally measured frequentist probabilities are, of course, free of most of these possible problems. And theoretically predicted frequentist probabilities are comparably error-free as long as the theory has been verified by a sufficient number of frequentist measurements (even though I sometimes need the Bayesian inference to determine how much testing is enough to be sure about a theory).
    Hi John,
    in particle physics, we often do pseudoexperiments to check p-values, so I at least in part concur that we act frequentist under the table. Anyway, I do not have much experience with other sciences, so I will abstain from giving here my worthless opinion. Let's just say that scientists have on average too little knowledge of the statistics on which they base their results.
    Hi Tammaso,

    For lay person like myself this again is interesting reading in mathematical application toward science. Thank you.

    This thinking correlates in my mind with logical approach of Venn Logic and it's applicability through my research toward computerization applicability.

    From historical perspective, Blaise Pascal, Pierre de Fermat became the founders of the theory of probability.

    A Short History of Probability

    "A gambler's dispute in 1654 led to the creation of a mathematical theory of probability by two famous French mathematicians, Blaise Pascal and Pierre de Fermat. Antoine Gombaud, Chevalier de Méré, a French nobleman with an interest in gaming and gambling questions, called Pascal's attention to an apparent contradiction concerning a popular dice game. The game consisted in throwing a pair of dice 24 times; the problem was to decide whether or not to bet even money on the occurrence of at least one "double six" during the 24 throws. A seemingly well-established gambling rule led de Méré to believe that betting on a double six in 24 throws would be profitable, but his own calculations indicated just the opposite.See: A Short History of Probability


    Johannes Koelman
    Nice article Tommaso. Indeed, in daily life we are all Bayesians (anyone who claims (s)he isn't, I invite to a game of poker).

    Problem is that in real life we are forced to deal with Bayesian estimates in an imprecise way. Few people do this with the correct intuition. We are Bayesians but poor at that.
    Dear Johannes, indeed, Bayesian inference may be *useful* to play poker. However, "useful" is not the same thing as "true": the rules-of-thumb to play the game efficiently are not about a "scientific truth" by themselves. To make them scientific, one needs to look at the game of poker from a higher level, and do the statistics of all possible types of games....
    So a guy who plays poker is not a "scientist" himself but he is an object studied by science, and this science can study its objects without any vague and mysterious discussion of priors. Science - in this case game theory - can study the players and their average profits for different strategies purely with the frequency probabilities, without any reference to "hypotheses" and their "priors" that the players may be using.

    So playing poker requires one to use the brain but playing it is not science. One can ask, for N players who use N different (or pairwise equal) algorithms, what is the probability that a certain player wins (or the average profit he earns). This can be calculated exactly - at least assuming that we also know the distribution for the ordering of the cards (probably the obvious random one, giving 1/M! weight to each ordering in the stack).

    Obviously, a good player wants to have a strategy that beats all other strategies. A "player" wants to borrow some conclusions of the "science" which is otherwise above him. In chess etc., one can in principle define the "ideal players". If they have a guaranteed path to win from a given situation, they will follow it. If they don't have a guaranteed path, they may try to confuse the enemy (and it's never quite well-defined what is the best way to confuse the foe if I am in trouble because it depends on the foe's psychology: but obstructions - trying to make the game longer - usually help because they give more chance to the foe to make a mistake). So configurations in chess can be divided to won, lost, and tied ones. The chessboard at the beginning is probably a tie - two ideal players will inevitably end up with a tie. It's still not proven for chess, as far as I know, but it's likely. For much easier games, all such things are known exactly.

    For games that depend on chance, including card games, there are usually no strategies to "win for sure". There are strategies that are better than other strategies. But one is not quite guaranteed that they're strictly ordered. A certain strategy may be great to defeat most enemies, but it may be vulnerable with respect to a particular, different, otherwise "mostly unsuccessful" strategy. At least I think so: it's because good strategies need to make some assumptions about the foe's behavior which can't be calculated mathematically. This fact won't matter for most decisions where you can say what is wiser or less wise (and where you assume that your opponent is using a similar strategy which is "usually efficient") - but they may matter sometimes. Sometimes the success of your cool strategy is affected by the other players' strategies. So there's no "objective truth" here. The reason why these particular Bayesian probabilites are imprecise is inherent to the problem: they can't ever be rigorous because one can't ever objectively say what is the probability that the opponent will use one algorithm or another to play.
    Johannes Koelman
    indeed, Bayesian inference may be *useful* to play poker. However, "useful" is not the same thing as "true"
    Ok, let's assume the two of us are heads-up in a game of Texas Hold 'Em. The blinds are high. Cards are dealt and you receive a pair of kings. You are first to act, and you decide to make a significant raise. Now I re-raise, taking you all-in.

    What do you do?

    Well, what is your optimal action depends on my hand. If I hold a pair of aces, you should fold. However, if I have any other hand, you should call.

    Now, the a-priori chance that I was dealt two aces is about 0.0045. That is a very,  very small chance. So you should call?

    If you would do that without giving this further though, you better stay away from the poker table.

    Obviously, you will update your view on the likelihood that I hold two aces in my hand, based on the information that I re-raised your raise. And yes, that means that you will evaluate P(AA|re-raise) based on Bayes' theorem using your views of the conditional probabilities P(re-raise|AA) and P(re-raise|not AA) (with the latter including the likelihood that I am bluffing).  

    This is more than just 'useful'. This simply is the only correct approach that any experienced player will follow. And this holds even if the player has not attended statistics classes and has never heard about Bayes.
    And that brings us back to the point Tommaso is making.
    Dear Johannes, we just lost a Sunday canasta tournament, by a whopping 4,000 points again - it's the 10th time almost in a row when our team did so. ;-) And because I don't play poker, I won't try to pretend that I could beat you in a gedanken experiment of poker. Let me assume that your recipes to play are correct.
    But I don't understand why you think that this is more than just "useful". A strategy that an experienced player will follow is the same thing as something "useful". It's nothing more than that unless one can show that it's more than that. And if the strategy only works given some assumptions about the behavior of the other players, it's not even a collection of robust mathematical or scientific propositions. It's just heuristic social science that may become invalid if the behavior of the other players changes.

    If I am thinking about the right card to throw away in canasta, I may want to prevent the next player from privatizing the stack - sorry, I won't try to use the right terminology because I can't speak English too well in such unusual contexts. In order to estimate what the next players may be collecting, I must have some idea about her behavior. She may prefer to collect some types of cards and their groups - but there's no "eternally universally canonically valid" choice to do so because her optimal strategy also depends on the strategy of others that she doesn't fully know.

    Maybe in poker, the strategy of others doesn't matter. Except that I am pretty sure that it does matter, too. So there can be better or worse strategies but none of them is the "exact truth" because the efficiency of any strategy still depends on the assumptions about the strategies followed by the other players. ... I understand the point he was making. I am just saying that the Bayesian reasoning from everyday life is not science and card games are the best example. 
    Johannes Koelman
    Dear Lubos -- sorry to hear you again got your ass kicked in a royal way at the card table. Don't despair, keep practicing! ;-)
    I don't understand why you think that this is more than just "useful". A strategy that an experienced player will follow is the same thing as something "useful". It's nothing more than that unless one can show that it's more than that. And if the strategy only works given some assumptions about the behavior of the other players, it's not even a collection of robust mathematical or scientific propositions. It's just heuristic social science that may become invalid if the behavior of the other players changes.
    Heuristic social science? The example I just gave is part of the optimal (unbeatable) strategy (the Nash equilibrium strategy for two-player short stack play).

    I believe poker is a prototypical example to demonstrate that Bayes matters in real life. It perhaps does not occur as frequently as it does at the poker table, but in daily life we all have to take decisions in the face of uncertainty. Bayes gives a mathematically correct guide on how to optimally respond given this uncertainty. That is: if you can estimate the (conditional) probabilities. That's the catch.
    Nice picture indeed.  I doubt very much that 1 () = 1000 () ever was a Chinese proverb, but it is  certainly true in this case.

    Here are a couple of Bayesian links.  The first is An Intuitive Explanation of Bayes' Theorem which the author describes as Bayes' Theorem \ for the curious and bewildered; \ an excruciatingly gentle introduction

    The other is Probability Theory With Applications in Science and Engineering by E. T. Jaynes, which is downloadable in PDF format. 
    Robert H. Olley / Quondam Physics Department / University of Reading / England
    Hello Robert,
    good links, thanks for the contribution.
    If Dr. Motl isn't Bayesian, then how come he tells me that I don't stand a chance of having anything useful to say about physics?

    Dear Carl, it's because that saying that one is not Bayesian (and he is a frequentist instead) doesn't mean that he has no beliefs or he can have no beliefs. Being Bayesian means to realize that beliefs are beliefs and the confidence levels quantifying such beliefs don't have an objective quantitative scientific meaning.
    So if my guess that you can say something useful and new about physics is 0.01%, there's no objective way to show that the correct result can't be 0.001% or something else. However, most sensible methodologies will agree that it's small. Is it clear? One could approximately calculate the probabilities in a frequentist way, too. Just take many people in the past who were similar to you (which is not quite well-defined, but it's good enough) and calculate the ratio.

    One Bayesian inference added, update: The numbers 0.01% and 0.001% above should be reduced by a factor of ten because I just read your comment at Dirac Sea Shores that atoms must have photon bags with photons ready to be released - and that this insight is more sensible for you than everything you learned about the emission of light at your grad school. ;-)

    I think these introductions to Bayesian inference usually miss the mark, in that they don't emphasize the right thing. Specifically, everyone seems to focus, mistakenly IMHO, on subjective priors. Most Bayesians aren't Bayesian because they decided they're really like to be more subjective about everything.

    The real difference between frequentists and Bayesians, again IMHO, is that frequentists condition on hypotheses and Bayesians condition on data. That is, a frequentist will treat a hypothesis as (provisionally certain), and ask what is the probability that different observations could result from this hypothesis. This naturally leads to a hypothesis-testing framework.

    A Bayesian, on the other hand, will treat the observed data as certain (since you observed it!), and ask what is the probability that a particular hypothesis could have produced this data. This naturally leads to a framework of assigning probability distributions to hypotheses.

    To a frequentist, it is meaningless to talk about "the probability of a hypothesis being true". A hypothesis is either true or it isn't: its probability is always 0 or 1, but you may not be sure which. You can only look at the set of data which could have been (but was not) observed and talk about how many of these hypothetical data sets would look like what you actually saw.

    To a Bayesian it misses the point to talk about "the probability of the observed data". The observed data is observed, with probability 1. What you're uncertain about is the hypothesis, the process that generated the data. That's the probability you're interested in.

    The difference between the two schools is what you're uncertain about: the hypothesis, or the data.

    In order to get at "the probability of the hypothesis, conditioned on the data", p(H|D), Bayes's theorem says you have to combine the likelihood p(D|H), the data-generating process, with a prior assessment of the hypothesis's probability, p(H). This is where the subjective prior stuff comes in. But that's really a side effect of one's desire to compute p(H|D) instead of p(D|H), rather than the main motivation for Bayesianism. Frequentists think it's meaningless to talk about p(H|D) since only one hypothesis can be true and governs the real world, so they limit themselves to p(D|H) (the likelihood) instead. For a Bayesian, p(D|H) is only a stepping stone, to be combined with p(H), in order to arrive at the real quantity of interest, p(H|D).

    This opened a whole philosophical debate as to what the prior "means". You could use population statistics, in a frequentist sense, as a prior. ("I know that the base prevalence of the disease in the population is X".) In that sense, you can interpret Bayesian probability in a rather objective frequentist sense. But more generally, when you don't have that kind of information, it's not clear what this prior is supposed to be. That's where the subjective Bayesian school was born: it's supposed to be some fuzzy encoding of one's "beliefs".

    But not all Bayesians subscribe to this subjective interpretation. Some of them think of priors as a kind of "regularization" of frequentist statistics, to keep inferences away from obviously absurd values. Common examples are avoiding nonzero probability of negative values for a quantity which physically cannot be negative, or controlling situations where the maximum likelihood solution runs away to odd values, or enforcing sparsity assumptions to make the inference better conditioned (like ridge regression). E.T. Jaynes subscribed to a form of "objective" Bayesianism where "ignorance" priors are governed by the symmetry of the problem; this, along with Jeffreys's work, lead to modern "reference" priors. If you talk to Andrew Gelman, an applied Bayesian, his motivation for Bayesianism has little to do with "subjective priors", or the use of priors at all. For Gelman, Bayesianism is about a simple, consistent, coherent way of constructing and working with hierarchical models, to regularize inference, and to automatically apply shrinkage estimates and parsimony penalizations. (But speaking of priors, Gelman favors "weakly informative priors", which are specific enough to rule out crazy inferences but vague enough to learn from the data.)

    Nicely written, and I almost completely subscribe to that! The frequency approach assumes that a theory is a fixed given starting point, and computes the predictions of the probability of some observed data from it. The Bayesian approach wants to revert it and assume the data, and compute the probabilities of individual hypotheses.

    Both of these approaches are kind of legitimate, but when one looks what kind of reasoning they imply, it just turns out that the frequency calculation is well-defined because well-defined theories imply well-defined, quantitative, accurate outcomes (such as probabilities that X is measured in various intervals).

    On the other hand, the reverted, Bayesian approach isn't straightforward. The data don't imply hypotheses. One needs inference, and when one looks how the probabilities of hypotheses must be calculated, it turns out that they depend on some priors for the hypotheses, too. Well, that's surely a "side-effect" of this whole approach. Except that the word "side-effect" masks that it is also a key derived feature of the Bayesian reasoning - something that one can't get rid of.

    A priori, it may be unclear that the Bayesian direction of conditional probabilities will force us to decide about psychological priors. But a posteriori, it is true. So in this sense, you may start with prior probabilities that would imply that the Bayesian reasoning is more robust than the frequentist reasoning. But if you take the evidence into account and calculate the new likelihood that the Bayesian or frequentist approaches are more robust, you will end up with the Bayesian posterior probabilities that show that the frequentist framework is actually more robust than the Bayesian one! The evidence shows it and the evidence matter - even according to the Bayesian viewpoint. ;-)

    According to this simple Bayesian argument, frequentist probability is more scientific than the Bayesian one. So if you started with the Bayesian one, you should end up with the frequentist one, anyway. :-)

    You indicate that it is "more logical" to assume that the data are given (because they're seen), and the hypotheses should be derived from them. Well, we have to do so if we're deducing the right theories (and initial conditions) describing the world around us. Except that the real world around us works exactly in the opposite direction: it has a fixed theory and/or initial conditions, and it produces observations data (and final states) out of them! ;-)

    So the frequentist approach appreciates how the real world actually works, which is why the numbers calculated in this framework are "objective", even if we want them to be accurate. On the other hand, the Bayesian approach appreciates the messy problem that the people actually have to solve to figure out the right theory (and initial conditions). It's not surprising that the frequentist approach is more popular among those who view the world as a result of some laws of physics that exist independently of us, while the Bayesian approach prefer "applied science" and calculations as a tool to figure out what is useful for us (that's why it has appeared in the game theory contexts above).

    Oh gosh, Tommaso, I got through the end of your article, but now I have a splitting headache! LOL But, it was worth it. ;-)

    This reminds me when I was doing graduate work in meta-logic and systems of inductive reasoning. I'm quite comfortable being a Bayesian and have been for over a couple of decades now.

    I have had many discussions with other logicians as to the question: is science, in general, truly inductive in nature? Well, I think you've answered that question very nicely, and the answer is a most definite, YES! ;-)
    Oh yeah Eric, that was the part I hid -I lingered on the small chance of readers getting to the bottom of the post, but I omitted to say the remaining ones would have a headache by then :)
    LOL Tommaso! ;-)
    I lingered on the small chance of readers getting to the bottom of the post, but I omitted to say the remaining ones would have a headache by then...
    Working on the probability that I would already know 50% of this stuff, I speed-read it at double my normal rate, thus avoiding the probability of a headache.  :-)
    I did that too, Patrick, but I still got the headache! LOL ;-)
    I did that too, Patrick, but I still got the headache! LOL ;-)
    Eric: I'll let you in on my little secret: since I knew 50% of this stuff, I only read every other word.  It seemed to make more sense that way.

    Only kidding, Tommaso. ;-)
    Well, I already knew 90% of this stuff, so I only had to read one in ten words. So there should be a greater probability that you should have had a bigger headache than I! LOL ;-)

    Also, only kidding, Tommaso. ;-)
    You don't worry about what you write here folks. I know you are kidding, and I doubt anybody else is reading this column by now ;-)