Last week I had the pleasure of getting interviewed by Janet Babin at the WHYY studio in Philly. Janet is putting together a piece on Open Notebook/Open Source Science for her Marketplace series on NPR.
It was encouraging to see how much interest is being generated on this topic lately, especially in the popular media. If you have listened to her pieces, such as the one on MIT's OpenCourseWare initiative, you would appreciate the pains to which she goes to provide a balanced perspective.
So it was interesting to see the issues that she asked me to address, based on her interviews with other parties. One of the concerns expressed about Open Notebook Science is that scientists would not want their raw data available to others without their interpretations.
I am used to dealing with questions of intellectual property rights, priority and impeding the ability to publish in prestigious journals. But the idea that it may be useless (or even perceived as irresponsible) to publish raw data without full analysis by the head scientist of a group is probably also an important barrier to the adoption of Open Notebook Science, or at least more open forms of science.
When my group publishes experiments on UsefulChem, the general order is typically: the experimental plan, the log, the results as raw data, observations then conclusions. Error correction based on feedback occurs at all points in the process.
So by default we almost always have much our raw data available without interpretation for long periods. And probably most of that information will never get interpreted (at least by us) because we don't need it to meet the narrow objectives of our experiments now or in the future.
But it is essential for these raw data to be available openly to humans and automated agents if we want Science2.0 to explode. (The data also need to be tagged and formatted properly to truly leverage automation - but more on that later)
The evolution of an experiment page is messy. Doing science is messy. There are errors to correct and faulty assumptions to confront and remove as we get more information and analyze an experiment.
I think that learning about science almost exclusively through polished journal articles can be discouraging, especially to new students. Like attorneys, scientists tend to write papers (at least the good ones) with arguments using selective evidence to support a clear point. There is nothing wrong with this, and I think that humans need this type of format much of the time to process new information.
However, this approach leaves a lot out about how science actually gets done. For example, if a chemist has developed a new reaction, the typical way to publish it is to try the reaction under different conditions and with different reactants. In principle this is very simple: do the reactions, fill in a table with yields then publish. In practice, at least in the organic chemistry labs where I have done my time, it does not work that way usually. Yields will vary between people and sometimes the reaction just won't work for a reason that never gets elucidated.
So what is the actual yield that will be reported? The best one? The worst one? The average? If you use the average, do remove outliers? If some of the product was spilled, do you still take that yield into consideration or completely scrap that run?
Every scientist who has written a paper has had to make a decision about what to do with the ambiguity of raw scientific information. And every day that the paper is not written and submitted because of ambiguity, the world doesn't know about it.
When we report our raw laboratory logs and data, we are not concerned about a number that will show up in a table (eventually hopefully) in a printed journal. We are concerned about truthfully reporting what we did, observed and thought at that time. There is no carpet under which to sweep ambiguity. All scientists should be doing that in their laboratory notebook. By sharing it in real time the world can benefit immediately.
Co-incidentally with my NPR interview, I just finished Diane Rehm's wonderful autobiography "Finding my Voice". In that book she talks about the evolution of talk radio in the 70s, including her pioneering efforts. People were discovering a new way to communicate and nobody really knew where it would end up. I think we are in a similar position now with science and new web technologies.