During London Olympics 2012, Google Knowledge Graph (Ref. 1)  proved to be very useful in knowing about various aspects of the games like player statistics, medal tally of countries, etc.  Google Knowledge Graph is a gigantic network containing millions of objects and billions of facts and relationships, where objects are represented by nodes and relationships between the objects (nodes) are represented by edges (Figure 1).

Figure 1 : Google Knowledge Graph
Source : http://www.google.com/insidesearch/features/search/knowledge.html

Figure 2 : Google search result for "london". Profile of London fetched from Google Knowledge Graph is shown in right most column.     Source : Google

Google search displays profiles of named entities fetched from the Google Knowledge Graph in an attractive way (Figure 2). It set me thinking - can we build a chemical knowledge graph which can be searched for getting a wide range of chemical information ? 

As luck would have it, I came across Bartosz Grzybowski et al's triad of back-to-back papers in Angewandte Chemie (August 6, 2012) on applications of Chematica - a network of millions of chemical compounds and chemical reactions.  The name Chematica was inspired by Mathematica - a popular software used in scientific computing.

Chematica caters to only organic synthesis as network analysis of inorganic compounds is difficult in their current chemical representation. Chematica network consists of nodes and edges (like Google Knowledge Graph)  which represent molecules and reactions respectively. Chemical rules have been built into this network, enabling it to predict optimal routes to synthesize a compound of interest (Ref. 2,3,4). 

The three papers in Angewandte Chemie describe three use cases of Chematica :

1. One pot synthesis for multi-step syntheses (Ref. 5)

2. Optimizing existing syntheses (Ref. 6)

3. Identify synthetic routes leading to harmful substances  (Ref. 7)

These papers demonstrate how application of Chematica knowledgebase could be useful in bringing down synthesis costs in pharma and chemical industry.  This database was the fruit of a decade of research work and has the potential to be a path breaking technology for chemistry.  A comparative study of current databases of chemical reactions and Chematica would enable chemists to get the best out of them. 

In life sciences, study of networks like chemical-protein interactions  (Figure 3), protein-protein interactions etc. in finding better drug targets is a mature field. What network analysis has done for life sciences, Chematica might be able to do it for chemistry- it could lead to greater use of network analysis and visualization approaches in chemistry (Ref. 8, 9, 10). 


Figure 3 : Network visualization showing interaction of aspirin with proteins.
Source : http://stitch.embl.de 

In 1990s, playwright  John Guare popularized the concept of "six degrees of separation", according to which any two person in the world are connected by five persons (Figure 4). Pervasiveness of social networks have brought people even more close - recent network analysis of facebook users have revealed that the degree of separation is five (i.e. connected by four persons) (Ref. 11).  One interesting application of this concept is Bacon number (also known as six degrees of Kevin Bacon).

Similarly Chematica network has demonstrated it's potential to decrease the degree of separation of reactants and products by computing shorter synthetic routes between them using some smart algorithms based on chemical rules (Ref. 12).

Figure 4 : Six degrees of separation 
Source : Wiki

Only time can tell how Chematica will impact chemistry. It may turn out to be the much needed catalyst for synthesizing a chemical knowledge graph which will encompass all branches of chemistry. 

References :

1. Google Knowledge Graph

2. Bartosz Grzybowski: Chematica is an internet for chemistry
    Interview by Ian Tucker, The Guardian, 2012

3. Northwestern Scientists Create Chemical Brain

4. The Automatic Chemist
    Phillip Ball, Chemistry World, 2012

5. Rewiring Chemistry: Algorithmic Discovery and Experimental Validation of One-Pot Reactions        in the Network of Organic Chemistry     Chris M. Gothard, Siowling Soh, Nosheen A. Gothard, Bartlomiej Kowalczyk, Yanhu Wei,
    Bilge Baytekin, Bartosz A. Grzybowski, Angewandte Chemie, 2012

6. Parallel Optimization of Synthetic Pathways within the Network of Organic Chemistry 
    Mikołaj Kowalik, Chris M. Gothard, Aaron M. Drews, Nosheen A. Gothard, Alex Weckiewicz,  
    Patrick E. Fuller, Bartosz A. Grzybowski, Kyle J. M. Bishop, Angewandte Chemie, 2012 

7. Chemical Network Algorithms for the Risk Assessment and Management of Chemical Threats
    Patrick E. Fuller, Chris M. Gothard, Nosheen A. Gothard, Alex Weckiewicz, Bartosz A.
    Grzybowski, Angewandte Chemie, 2012 

8. Chemical-protein interaction networks and visualization tools

9. Global mapping of pharmacological space
    Gaia V Paolini, Richard H B Shapland, Willem P van Hoorn, Jonathan S Mason, Andrew L  
    Hopkins, Nature Biotechnology, 2006

10. http://stitch.embl.de/ 

11. http://en.wikipedia.org/wiki/Six_degrees_of_separation

12. The ‘wired’ universe of organic chemistry  
      Bartosz A. Grzybowski, Kyle J. M. Bishop, Bartlomiej Kowalczyk, Christopher E. Wilmer, Nature
      Chemistry 2009