Chemical research has traditionally been organized in either experiment-centric or molecule-centric models. This makes sense from the chemist's standpoint. When we think about doing chemistry, we conceptualize experiments as the fundamental unit of progress. This is reflected in the laboratory notebook, where each page is an experiment, with an objective, a procedure, the results, their analysis and a final conclusion optimally directly answering the stated objective. When we think about searching for chemistry, we generally imagine molecules and transformations. This is reflected in the search engines that are available to chemists, with most allowing at least the drawing or representation of a single molecule or class of molecules (via substructure searching). But these are not the only perspectives possible. What would chemistry look like from a results-centric view? Lets see with a specific example. Take EXP150, where we are trying to synthesize a Ugi product as a potential anti-malarial agent and identify Ugi products that crystallize from their reaction mixture. If we extract the information contained here based on individual results, something very interesting happens. By using some standard representation for actions we can come up with something that looks like it should be machine readable without much difficulty:
  • ADD container (type=one dram screwcap vial)
  • ADD methanol (InChIKey=OKKJLVBELUTLKV-UHFFFAOYAX, volume=1 ml)
  • WAIT (time=15 min)
  • ADD benzylamine (InChIKey=WGQKYBSKWIADBV-UHFFFAOYAL, volume=54.6 ul)
  • VORTEX (time=15 s)
  • WAIT (time=4 min)
  • ADD phenanthrene-9-carboxaldehyde (InChIKey=QECIGCMPORCORE-UHFFFAOYAE, mass=103.1 mg)
  • VORTEX (time=4 min)
  • WAIT (time=22 min)
  • ADD crotonic acid (InChIKey=LDHQCZJRKDOVOX-JSWHHWTPCJ, mass=43.0 mg)
  • VORTEX (time=30 s)
  • WAIT (time=14 min)
  • ADD tert-butyl isocyanide (InChIKey=FAGLEPBREOXSAC-UHFFFAOYAL, volume=56.5 ul)
  • VORTEX (time=5.5 min)
  • TAKE PICTURE
It turns out that for this CombiUgi project very few commands are required to describe all possible actions:
  • ADD
  • WAIT
  • VORTEX
  • CENTRIFUGE
  • DECANT
  • TAKE PICTURE
  • TAKE NMR
By focusing on each result independently, it no longer matters if the objective of the experiment was reached or if the experiment was aborted at a later point. Also, if we recorded chemistry this way we could do searches that are currently not possible:
  • What happens (pictures, NMRs) when an amine and an aromatic aldehyde are mixed in an alcoholic solvent for more than 3 hours with at least 15 s vortexing after the addition of both reagents?
  • What happens (picture, NMRs) when an isonitrile, amine, aldehyde and carboxylic acid are mixed in that specific order, with at least 2 vortexing steps of any duration?
I am not sure if we can get to that level of query control, but ChemSpider will investigate representing our results in a database in this way to see how far we can get. Note that we can't represent everything using this approach. For example observations made in the experiment log don't show up here, as well as anything unexpected. Therefore, at least as long as we have human beings recording experiments, we're going to continue to use the wiki as the official lab notebook of my group. But hopefully I've shown how we can translate from freeform to structured format fairly easily. Now one reason I think that this is a good time to generate results-centric databases is the inevitable rise of automation. It turns out that it is difficult for humans to record an experiment log accurately. (Take a look at the lab notebooks in a typical organic chemistry lab - can you really reproduce all those experiments without talking to the researcher?) But machines are good at recording dates and times of actions and all the tedious details of executing a protocol. This is something that we would like to address in the automation component of our next proposal. Does that mean that machines will replace chemists in the near future? Not any more than calculators have replaced mathematicians. I think that automating result production will leave more time for analysis, which is really the test of a true chemist (as opposed to a technician). Here is an example of an analysis module making a simple point, useful to the chemistry community, and linking back to result modules that ultimately link back to the original experiment in the online laboratory notebook:
Context: obtaining precipitates in the CombiUgi project Ugi reactions in methanol where the solution is supersaturated with Ugi product may give false negatives for precipitation. For example, a Ugi product rapidly crystallized at the 17th hour (RESULT0003) after addition of all reagents, while appearing as a clear solution at the 15th hour (RESULT0002). It is therefore recommended that the vials be submitted to vortexing (15 s) prior to taking a picture.
We'll be recording these analysis and result modules on UsefulChem wiki pages: We'll be using InChIKeys for compact unambiguous identification of molecules (and convenient indexing in Google) and the terms in this post for action options. Anyone is free to automatically incorporate these in a database, as long as attribution is provided. (If anyone knows of any accepted XML for experimental actions let me know and we'll adopt that.) I think this takes us a step closer from freeform Open Notebook Science to the chemical semantic web, something that both Cameron Neylon and I have been discussing for a while now.