A new gold standard has been set for openness and reproducibility in research - and it was done by Cambridge computer scientists. At a talk today at the 12th USENIX Symposium on Networked Systems Design and Implementation in Oakland, they are going to unveil peer-reviewed results with 200 GB of data and 20,000 lines of code.
All of the experimental figures and tables in the award-winning final version of their paper, which describes a new method of making data centers more efficient, are clickable and the links go to a website where the researchers have produced technically detailed descriptions of the methods for every one of their experiments. These descriptions include the original data sets and tools that were used to produce the figures as well as free and open source access to all of the source code that they wrote and modified.
Computer science has embraced open access more than many disciplines but as academics find themselves in a 'publish or perish' culture, the reliability of research results has come into question. Beyond computer science, a number of high-profile incidents of errors, fraud or misconduct have called quality standards in research into question. This has thrown the issue of reproducibility - that a result can be reliably repeated given the same conditions - into the spotlight.
"If a result cannot be reliably repeated, then how can we trust it?" asked Matthew Grosvenor, a PhD student from the University's Computer Laboratory and the paper's lead author. "If you try to reproduce other people's work from the paper alone, you often end up with different numbers. Unless you have access to everything, it's useless to call a piece of research open source. It's either open source or it's not - you can't open source just a little bit."
In the past this exhaustive level of openness might have been costly, but thanks to cheap cloud storage, the researchers have made nearly 200 GB of data and 20,000 lines of code freely available under an open-source license.
"It now should be possible for anyone with a collection of computers to follow our instructions and produce our exact graphs," said Grosvenor. "We think that this is the way forward for all scientific publications and so we've put our money where our mouth is and done it."