I sometimes like to read the arXix preprint physics site. It's where a lot of papers go before they are in journals. It was open access, a way to see what scientists were working on before the results were locked behind a corporate journal paywall, before open access was even a thing.

It's also fun to see what crazy stuff some theoretical physicists are writing. They used to be a rather niche part of the field - jobs where you were paid to think rather than do were once rare, even after Einstein and Heisenberg became science rock stars (1) - but now it's the experimental physicists that have a harder times finding jobs. The pop culture phenomenon of "String Theory" led to a whole rush of theoretical physicists being hired and that led to a whole lot of arXiv papers where math plus philosophy could show other strange stuff, like how time travel might be possible.(2)

Or this gobbledy-gook.

Theoretical physicists, you don't need to rush to Twitter to block me, for two reasons. One, if you are a pop culture writer pumping out books and articles on this philosophical nonsense, you've already blocked me, and two, I am not saying all theoretical physics is bad. I admire and respect many of you. I am just noting that math is a language, and language can create fiction, like time travel stories.

There is no need for an arXix version of biology where theoreticians write stories they hope scientists in the future will make real; because such stories are already published in dozens of epidemiology journals as serious content. 

Epidemiology has too often become Apocalyptic fan fiction

Like theoretical physics, epidemiology was once a rather rare job. It wasn't easy to get employed saying you wanted to find things scientists don't find and you would do it using statistics. Smoking changed all that. We did not do human experiments for smoking, having people take up smoking in a clinical trial and do it for 40 years is obviously unethical, so like in other cases (especially where people were dead) we used epidemiology. 

Epidemiologists were able to find the literal smoking gun that was killing people and it was the beginning of the end for cigarettes. It made epidemiologists cool. Next we got the International Agency for Research on Cancer (IARC), which was a kind of Justice League for epidemiologists. Founded in 1965 and first directed by one of my heroes, Dr. John Higginson, IARC wanted to end many causes of cancer by discovering what lifestyle or environmental causes there were. We now know that smog kills people - we obviously knew in a common sense way that after the London Fog deaths but that was assumed to be a compound in the smog rather than the air quality. It turned out to be both but that set the world on course for the terrific air quality we enjoy today.(3) We have epidemiology to thank for it.

But once epidemiology did a few authoritative things, people began to rush into it and when that happens in any field there will be people who do it for the wrong reasons also. With the big problems discovered, some epidemiologists began to declare obscure hazards with data based on diaries and surveys, like Food X is linked to Disease Y.(4)

Science-y forms of epidemiological seemingly created to tell a story rather than show evidence are how we can get a study linking diet soda to heart disease risk. 

A p-value (probability value) is, in the right hands, valuable, but it is too often not in the right hands. This null hypothesis significance testing sounds complex but is really the random chance of obtaining the same results if the null hypothesis is true. That is not the probability that the hypothesis is true, or that the results were produced by random chance. It is not the size or importance of the effect, something can be statistically significant in a paper and meaningless in in the real world. If I tell you a chemical in Scotch is hazardous to your health, you might be worried, unless I tell you that it is only a hazard if you drink 10,000 shots of Scotch per day.

Yet ignoring such relative importance is what has led to California Prop 65, where Walmart will just slap cancer warning labels on the entire building, because 900 different cancer warning labels are meaningless. IARC is not making errors, they are doing it on purpose, and they wrap themselves in the flag of p-value. It's become such a fetish that epidemiologists will write papers using it, while pretending to not be misleading the public, knowing full well this can happen:


Data dredging, p-hacking and HARKing (hypothesis after the results are known) are all too common.

As you saw in the XKCD cartoon above, they ran the problem multiple times and only the one positive got noticed. While hilarious, most papers don't run multiple tests of one cause and effect, they'll run one model with lots of causes and effects, and it leads to results that would be hilarious, if they didn't panic the public and cost companies billions of dollars.

Here's how.

How to show random coin flips are not random and claim a p-value under .05

Here is why it's easy to get a result you can publish. Say you have a coin flip generator do 10 runs of 61 flips. Seven out of 10 times, it will have a run of 5 heads in a row and you can claim in a paper that coin flips are not actually random because the statistical significance is right there: a sequence of five heads has a probability of 0.55=0.03125 of occurring - that is a p value of under .05. Publish that paper! 

Except it's nonsense. They won't be five heads in a row in the same place. But in food studies, if you include dozens of foods (the first Harvard Food Frequency Questionnaire did use 61) and ask a lot of questions,(5) you are guaranteed to find something statistically linked to something.(6) Ask enough questions and you can even get a food that will both cause and cure something. This is due to "multiple testing" and this "multiplicity" problem has become so rampant some journals are even banning papers whose source for authority are a claim to have a p-value of .05.

And a short while ago I was a signatory on a Nature paper asking for p-values to either be used correctly or not at all. A paper last month in Stroke, one of the in-house magazines of the American Heart Association, declared that diet soda was "linked" to more heart issues in postmenopausal women. The data were women who could recall drinking a can of diet soda in the previous three months. Using the results - older women who claimed they drank two diet sodas had a higher risk of a heart issue  than older women who claimed to drink them once a week or not at all - allowed lead author Yasmin Mossavar-Rahmani, epidemiologist at the Albert Einstein College of Medicine, to conspiratorially speculate,"What is it about these diet drinks? Is it something about the sweeteners? Are they doing something to our gut health and metabolism? These are questions we need answered."

Or not. There are numerous confounders outside the obvious, which are recall bias and people just not telling the truth. The biggest factors are obesity and existing heart disease. Someone might drink more diet soda because they are obese, whereas someone not obese may be drinking whisky, an actual carcinogen. There was no stroke linkage to women who were of normal weight or even overweight. In reality, women were 16 percent more likely to die from any cause than women who drank diet beverages less than once a week or not at all. Worldwide the premature death rate for obese women is 14-16 percent greater. That is not coincidence.

The authors acknowledge all of the limitations but then rationalize 'weight of evidence' is on their side. If enough statistical correlations agree diet soda is bad then there must be something to it, they insist. Astrologers and psychics can use weight of evidence the same way. Does that make their beliefs scientific?

Another paper on diet soda found that it reduces the chance of colon cancer recurring. What to believe?

No answer is forthcoming from epidemiology because they are the time traveling theoretical physicists of the health world. All they have to do is throw around some math, get press attention, and then tell doctors and scientists to figure out a plausible mechanism for their statistical conclusion. 


(1) For a fun read, see Dr. Johannes Koelman explain the quantum world using socks in a sock drawer. You still won't understand it, but you'll be entertained in not knowing.

And see this bonus picture of Einstein and Niels Bohr sitting around in the 1927 Solvay meeting. They are probably talking about baseball or cigars or making fun of Heisenberg, though. They didn't agree on physics and it got no better with age.

(2) Those days are over. When the American comedy "The Big Bang Theory" debuted, its most irascible, and therefore considered most brilliant by the public, physicist was a string theorist. He has since somehow abandoned the field, likely because the creator and writers realized it was all made up. That's not possible in the real world, unless you have tenure. 

(3) You wouldn't know it by reading epidemiology. Once it was shown that smog (PM10) causes deaths, air quality began to improve a lot but in the 1990s, despite having some of the best air quality in the world, California Air Resources Board maps began to use PM2.5, allowing them to claim we are on the verge of collapse every time there is a wildfire, which is numerous times per year in the state. Yet actual death records can't link PM2.5 with any mortality, including during wildfires. But it allows air quality maps that would be green or yellow to be a scary red so media love them, because they don't understand the particles are so small it takes an electron microscope to detect them 

(4) Sometimes this was intentional and sometimes it was not understanding the statistics that epidemiologists should be grounded in, sometimes it was desperation to put out a paper. Since math is a language, it became worrisome that claims about science seemed more like Jean Plaidy historical fiction novels. Nothing wrong with those. I own some, but they are what they are.


61 foods, rows, and 10 health effects, columns. A “1” indicates statistical significance, p<0.05, and a “0” indicates no nominal statistical effect. Each “1” is a statistical false positive. For each “1” a paper could be written about a finding that would not be expected to replicate. Source: Drs. Stan Young and Henry Miller, Genetic Literacy Project.

(6) Dr. Stan Young and I went to the National Institutes of Health where he used Dungeons and Dragons dice. He can show how to find statistical significance from random stuff without torturing the data - because unlike in D&D, the trolls are often in charge.

Here is a graphical representation created using a 6-sided die. You can see that results are not as "random" as people believe random is, so when you start and stop the data makes a huge difference in the result.