Recent research has shown an alarming number of peer-reviewed papers are irreproducible and it isn't just social sciences surveys or weak observational studies. It's in fields like biology.


The ability to duplicate an experiment and its results is a central tenet of the scientific method but it is getting harder to require. Sometimes because the experiments are not easy to reproduce - it took $10 billion to find the Higgs boson - but perhaps also because it isn't stressed enough in culture.

There is something to that. As we have seen with SCIgen just writing papers based on a computer algorithm and having them accepted by IEEE and Springer, when companies are willing to take anything in return for money, people are going to take shortcuts. But for the most part reproducibility is difficult because papers might be based on six years of work. There is no shortcut to that.

But math and statistic professors say they can help with some of it - and they may be right, because it might not be the data that would fall apart under peer review, but the statistics. A paper claiming that a particular intervention led to a 42% rise in income, for example, concluded based on a small sample of Jamaican students, rightfully concerned people who know statistics and anyone with common sense, but it passed peer review.

Using software may take away some of the burden, say scholars from Smith College, Duke University and Amherst College who looked at how introductory statistics students responded to a curriculum modified to stress reproducibility. The reason was because last year, on the heels of several retraction scandals and studies showing reproducibility rates as low as 10 percent for peer-reviewed articles, Nature dedicated a special issue to the concerns over irreproducibility - one article is linked in the opening paragraph. 

Nature's editors announced measures to address the problem and encouraged the science community and funders to direct their attention to better training of young scientists. PLOS has done something similar, though they claim to peer review 10,000 articles per year so requiring third-party reproducibility, at taxpayer expense, is noble but it may cause their effort to collapse. When researchers who have slid through the system with editorial review and a checklist of four questions now have to pay for third-party data verification in order to pay to have an article published, they may balk and choose a different publisher.

"Too few biologists receive adequate training in statistics and other quantitative aspects of their subject," the editors wrote. "Mentoring of young scientists on matters of rigor and transparency is inconsistent at best."

Very true, whereas physicists live and die on understanding the pitfalls of statistics. The statistics and math scholars would really have been horrified if they examinated the wonky statistical significance filters used in fMRI imaging and the social sciences.

So the authors of the new paper looked to their own classrooms for ways to incorporate the idea of reproducibility. 

"Reproducing a scientific study usually has two components: reproducing the experiment, and reproducing the analysis," said Ben Baumer, visiting assistant professor of math and statistics at Smith College. "As statistics instructors, we wanted to emphasize the latter to our students."

The grade school maxim to "show your work" doesn't hold in the average introductory statistics class, said Mine Cetinkaya-Rundel, assistant professor of the practice in the Duke statistics department. In a typical workflow, a college-level statistics student will perform data analysis in one software package, but transfer the results into something better suited to presentation, like Microsoft Word or Microsoft PowerPoint.

Though standard, this workflow divorces the raw data and analysis from the final results, making it difficult for students to retrace their steps. The process can give rise to errors, and in many cases, the authors write, "the copy-and-paste paradigm enables, and even encourages, selective reporting."

"Usually, a data analysis report, even a published paper, isn't going to include the code," Cetinkaya-Rundel said. "But at the intro level, where this is the first time students are exposed to this workflow, it helps to keep intact both the final results and the code used to generate them."

Enter R Markdown, a statistical package that integrates seamlessly with the programming language R. The team chose R Markdown for its ease of use -- students wouldn't have to learn a new computer syntax -- and because it combines the raw data, computing and written analysis into one HTML document. The researchers hoped a single HTML file would give students a start-to-finish understanding of assignments, as well as make studying and grading easier.

The study introduced R Markdown to 417 introductory statistics students (272 from Duke University, 145 from Smith College) during the 2012-2013 school year. Instructors emphasized the lesson of reproducibility throughout each course and surveyed 70 students about their experience using R Markdown for homework assignments.

The survey, conducted once at the beginning of the semester and once at the end, showed gradual gains in student preference for R Markdown. The percentage of respondents who indicated they found R Markdown to be frustrating at first but eventually got the hang of it jumped from 51 to 75 percent. The students vastly preferred it to the alternative, with 70 percent strongly disagreeing that they'd rather use the copy-and-paste method.

The research team also found that even when students had no prior computing experience or expressed negative attitudes toward R Markdown, their grades did not appear to suffer. Future surveys will ask more pointed questions about how much of the lesson on reproducibility students absorb from the modified curricula.

As the use and analysis of big data becomes increasingly sophisticated, the team writes, the ability of researchers to retrace steps and achieve the same statistical outcomes will only grow in significance.


Citation: Baumer, Ben; Cetinkaya-Rundel, Mine; Bray, Andrew; Loi, Linda; Horton, Nicholas J., 'R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics',  Technological Innovations in Statistics Education Volume 8, Issue 1, 2014. uclastat_cts_tise_20118. Source: Duke University.