Molecular biologists have long operated on the principle that knowing the structure of a biological entity is critical for understanding how it works. Most famously, this was the premise behind one of biology's most iconic discoveries, Watson and Crick's model of the structure of DNA. Structure-function studies have been the foundation of much of molecular biology ever since.
Although the structure of DNA yielded almost immediate insight into an important biological problem, solving structures hasn't always resulted in a eureka moment. The same year Watson and Crick received their Nobel Prize, two other scientists, John Kendrew and Max Perutz, were also awarded the Nobel for determining the structure of a biological molecule. Unfortunatly for Kendrew and Perutz, instead of a flash of insight the result was incomprehension. They had determined the structure of two related proteins, myoglobin and hemoglobin, and these structures at first glance looked like just an irregular mass of thousands of atoms.
Happily, the befuddlement didn't last long. Scientists quickly learned how protein structures explain their function, and today we have amazing structural snapshots of proteins in action. These studies of structure have helped biologists understand the gritty details of key biological processes, such as how membrane-embedded ion pumps enable our nerves to conduct electrical signals. Using a protein's structure to understand its function has now become routine.
But today biologists are facing another moment of incomprehension. We're staring at structures of a different type of biological entity: a network, not an irregular mass of atoms, but one of connections. We know that biological networks give cells their ability to make sense of the world, to process information, to sense the environment or the cells' own internal state, and to take appropriate action. Scientists have been mapping these networks in great detail for years now, but the result is frequently just a giant, molecular hairball (or 'ridiculogram', as a friend calls it).
In other words, scientists are facing yet another giant structure-function problem. How do the strucures of biological networks result in something functional?
Hemoglobin (left) and the Yeast Protein Interaction Network (right)
Biologists began to make functional sense of protein structures by detecting patterns. In the first structures, Max Perutz and John Kendrew identified helices which had been predicted from theory by the physical chemist Linus Pauling. As more structures came out, scientists began identifying recurring protein folds, called domains, which are modular structures that can be involved in carrying out specific tasks. These domains have been reused over and over in nature - nearly all proteins, of the millions of proteins identified in the thousands of organisms sequenced, contain one of several thousand known protein folding domains. These domains are usually clearly recognizable from the amino acid sequence of the protein, which means that often scientists do not need to actually do the experiment to determine a protein's structure - enough of the structure can usually be predicted computationally.
Domains are structures that provide important clues to the function of a protein. Certain protein domains bind to DNA for example, and thus if you discover a new protein that contains a homeodomain, you can make a good bet that your new protein binds a DNA sequence called a homeobox.
Protein structure information tells us more than just what domains make up a protein. By carrying out detailed structure-function experiments, biologists are able provide a physical explanation of how enzymes and other proteins work. Why is hemoglobin more likely to bind a second oxygen molecule after binding one? Because when the first oxygen binds, it distorts to structure of the protein and changes the shape of the binding site for the next oxygen molecule. How does the digestive enzyme pepsin chop up other proteins? An aspartate amino acid positioned just right 'activates' a water molecule, which can in turn break a peptide bond.
What this means is that our knowledge of how proteins work is based on efforts to understand how structure generates function. At first, the functional properties of protein structures were not intuitively obvious, but after detailed, hypothesis-testing experiments, scientists quickly figured out some general ideas, and now structure-function studies are routine (though not easy by any means).
Network Structure-Function Studies
Biologists are now facing a new structure-function challenge: networks. New technologies have made it possible to map cellular networks on a scale not possible just a decade ago. For many biological systems, we have a good picture of which proteins interact with each other, which regulators control which genes, and what molecular path signals follow as they are passed from the outside environment of the cell to the relevant cellular machinery.
But these maps are just static masses of data. What scientists really want to know is how the dynamics work - how a cell makes a circadian oscillator, or how regulatory circuits produce "sniffers, buzzers, toggles and blinkers." How does a cell flip a switch and keep it on? How is a gene timed to come on and shut off at just the right moment?
As in the case of protein structures, scientists are tackling these questions by looking for patterns in the network maps. Protein structures are largely defined by modular domains, and it turns out that biological networks are also made up of modular structures. These structures, called network motifs provide important clues to understanding how the structure of a regulatory circuit produces its effects. Network motifs are small sets of interacting genes that make up various types of feedback or feed-forward loops, which are just small biological circuits.
These network motifs were identified because they show up in biological networks much more frequently than you would expect if those networks were just wired together randomly, giving researchers a hint that these motifs were playing a significant role in the function of these networks. Now these network motifs are the subject of intense focus. Researchers are studying them in simple systems like bacteria, which can be stimulated with an environmental stimulus and measured for a defined response. Some network motifs operate as a response-delaying mechanism, preventing the activation of a response until the cell can be sure a stimulus is real and not just background noise. Other motifs form negative feedback loops, like your thermostat, that activate a pathway when it is needed and shut it off when the job is done. In many cases a cell needs to integrate several signals before making a decision, and these network motifs can do the job, functioning as AND or OR logic gates.
A major challenge in the effort to make sense of all this is the fact that, as was the case with proteins, the function of a network motif is not obvious from its structure. You need to do the math by building analytical models. And in fact, depending on the numbers you plug into your model, the actual output of a network motif can show dramatically different behaviors. Within certain parameters, you may get a simple response curve, but with other parameters, you get an oscillator. The challenge then becomes determining which behavior is occurring in the cell, and that involves difficult experiments to measure the critical parameters.
In fact, parameterizing a biological network model can be a major experimental challenge - so difficult at times that you may be tempted to ask, why even bother? Why bother making detailed parameter measurements to plug into a mathematical model, when instead you could just go ahead and do the experiments to find out what is actually happening inside the cell? If you want to know whether a network motif is producing oscillations, why not just do the experiment, instead of trying to model oscillations with a network map?
Why Build Models?
This is largely how molecular biology has been done for years. Rather than developing theories based on mathematical models, it has been much easier to experimentally determine a cell's qualitative behavior, and scientists have made tremendous progress with this approach. But there are at least three good reasons why we should turn to making models of networks.
First, network models can help use test whether our maps are correct, or whether we are missing critical interactions. Scientists can ask, 'can this network structure possibly produce this behavior?' Models can tell us, for example, that the network connections we have cannot possibly produce the oscillator that we observe in our experiments, and thus we're missing some critically important component in our network map. This is what good models should do: generate new hypotheses, which we can go and test.
Second, it is important to understand not just what is going on in the cell, but how things actually work. It is possible to study proteins in the complete absence of any structural information, and this is what scientists did for decades before the first protein structure came out. But that kind of phenomenological study of proteins does nothing for our ability to look at a brand new protein sequence and predict what that protein does. Without understanding how the structure of a protein produces its function, we also can't predict what the impact of a mutation will be, or engineer a protein to have a new function.
The same is true of networks. We want to know how those networks produce their effects, so that we can predict and understand the effects of changes to the network (such as when we knock out one component with an anti-cancer drug), or even design new biological circuits to have new functions.
And third, modeling networks helps us get around the misleading question of why - why a network is structured a particular way. It's often not very helpful to answer the question of why, because a big part of that answer is evolutionary contingency - a regulatory circuit is structured the way it is simply because of its evolutionary history, and not because that was the best design for a given function. A better question to ask is how - how the structure of a network gives rise to its function. To answer that requires modeling.
All of this means that biologists have to start thinking seriously about how to do the hard experiments necessary to parameterize network models. Non-quantitative, genome-scale experiments have generated masses of network maps, but these experiments are poorly suited to the kinds of measurements needed for modeling. Like physicists, biologists now have to be comfortable with mathematical theory, and with hard experiments that generated the detailed measurements to test those theories. At least biologists can take comfort in the fact their situation isn't quite as bad as the physicists', who, in order to test their models, have to build multi-billion dollar colliders that cannot even be run all 12 months of the year.
For some further reading:
"Modeling the chemotactic response of Escherichia coli to time-varying stimuli", Tu, Shimizu, and Berg, 2008.
"Dynamics and Design Principles of a Basic Regulatory Architecture Controlling Metabolic Pathways", Chin, Chubukov, Jolly, DeRisi, Li, 2008.
An Introduction to Systems Biology, Uri Alon, 2007.
The protein network image is from Zotenko, et al., "Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality", PLoS Comp. Biol 4(8): e1000140.