In the most recent edition of PhysicsWorld, there are two articles that on the face of it have little to do with each other: one is about Jan Hendrik Schön, the physicist formerly famous for creating the first organic superconductor and the first single-molecule transistor, and now most famous for having simply made up all of those results out of thin air, the greatest kind of scientific fraud in physics.

The other article is about how the internet is transforming scientific communications, looking at which new means of scientific communication failed (such as Physics Comments and scientists contributing to Wikipedia -- although Scholarpedia is taking off quickly at the moment, probably because its signed and peer-reviewed authorship model is more in line with academic customs than Wikipedia's semi-anarchistic one) and which succeeded (the arXiv) in making the dissemination of scientific results quicker and more transparent.

At first glance these two topics appear to have little to do with each other. At second glance, however, they are closely intertwined.

Schön's deception was only possible because the researchers who tried and failed to replicate his results didn't have access to his primary data. Once doubts had been raised over the appearance of two completely identical graphs supposedly representing two completely different sets of experimental data, Schön's primary data were subjected to close scrutiny and were found to be non-existent -- his lab books had been destroyed, and his samples were damaged beyond recovery.

This raises the question whether it would have been possible to even contemplate such a fraud in an environment where scientists are genuinely expected to hide nothing, and in particular to make their primary data publicly available after publication.

The more radically open schemes, such as the open notebook science proposed and practiced by fellow scientific blogger Jean-Claude Bradley, where raw data are being made public before publication, are unlikely to take off largely because of concerns over the enormous plagiarism potential. But once results have been published and priority has thus been established by the original authors, there is no immediately obvious reason not to allow other researchers to perform their own analyses of the primary data, either to confirm (or possibly to refute) the original analysis, or to use their own methods to obtain results from the data that the original authors didn't (either because they weren't interested or because they didn't have the relevant analysis methods at their disposal). Some access controls are needed, of course, in order to ensure that the later researchers will duly acknowledge the use of the original group's datasets.

It is hard to see how a fraud like the Schön case could have occurred under a scheme like this; the groups who wasted years on trying to replicate his results to no avail would likely have realised the fraud if they had had access to Schön's lab books.

Just like with the arXiv (which started out as a specialised High Energy Physics preprint server and now has revolutionised publishing in most of physics and mathematics, and in some parts of computer science, biology and finance), particle physicists are pushing ahead with schemes to open access to raw data: in lattice QCD with dynamical fermions, the most computationally expensive step is generating the configurations of gauge fields that are then further analysed to obtain answers for the masses and other properties of hadrons.

Many groups that have very interesting ideas of what particles and phenomena to study with which new methods simply cannot afford to generate their own unquenched ensembles of gauge configurations (we are talking many Teraflop-months here), and would be stuck with the quenched approximation (which amounts to ignoring the effects of dynamical quarks) if it wasn't for the fact that an increasing number of collaborations make their ensembles available after performing their own initial analysis.

Configurations have been available for a while at The Gauge Connection (the name is a pun that only particle theorists will appreciate), and are now quickly beginning to be available on the International Lattice Data Grid (ILDG). This way the many CPU cycles that have been invested in generating these ensembles are put to even better use by enabling other groups to run their analyses on them.

Just like in the case of the arXiv, it may take a while for other disciplines to follow suit, but it appears likely that if and when more and more scientists choose to make their raw data public after publication (and those that don't therefore become increasingly subject to suspicion by their peers), a fraud case like that of Jan Hendrik Schön will become quite impossible at some point in the future.