Walter Fontana, a Professor of Systems Biology at Harvard, reflects on models in biology:

Models will play a central role in the representation, storage, manipulation, and transmission of knowledge in systems biology. Models that are capable of fulfilling all these purposes will likely differ from the familiar styles deployed with great success in the physical sciences. "Classical" flavors of models may be viewed on a continuum between two major types:

Models that are of heuristic nature.
Although formal (mathematical), these models are not primarily intended for data analysis. They represent highly idealized or stylized situations aimed at discovering necessary and sufficient conditions for specific behaviors. Such models pursue a commitment to explanatory principles more than quantitative prediction. They are thought-experiments whose complexity often demands a computational infrastructure for execution.

Models that aim at being realistic.
Models of this kind are designed to capture as many detailed mechanistic features as are needed for generating quantitative fits with experimental data. Success is cast in terms of quantitative prediction, that is, the computation of numerical ranges for critical system variables and parameters. Models of this kind are often hailed as providing guidance for new experiments aimed at modifying system dynamics, such as the identification of drug targets. Realistic models are important in assessing when we know enough about a system to effect specific interventions at a given experimental resolution.


I think I would add a third kind: models (let's call them machine learning models, although machine learning can happen in other types of models as well) that aim to be quantitatively predictive, but without capturing detailed mechanistic features - in this sense they are not 'realistic'. These models take as input large amounts of data and find correlations that enable them to make predictions about similar data sets. For example linear regression models of the contribution of QTL to a quantitative trait. These models don't pursue the "explanatory principles" of Fontana's heuristic models, nor do they relate physical mechanism to system output. They're black-box, data crunching models, and they've been useful for doing things like scanning genomes for transcription factor binding sites or analyzing how much variance in a trait is explained by a set of QTL.

These are probably the easiest models to build, because you don't have to concern yourself with deep explanatory principles or the underlying physical mechanism producing the process you're modeling. But they're also the least satisfying, at least if you're interested in understanding.




Read the feed: