With hot, new technologies, biologists are taking higher-resolution snapshots of what's going on inside the cell, but the results are stirring up controversy. One of the most interesting recent discoveries is that transcription is everywhere: DNA is transcribed into RNA all over the genome, even DNA that has long been thought to have a non-functional role. What is all of this transcription for? Does the 'dark matter' of the genome have some cryptic, undiscovered function?

Unfortunately, in all of the excitement over possible new functions, many biologists have forgotten how to frame a null hypothesis - the default scenario that you expect to see if there is no function to this transcribed DNA. As a result, the literature is teeming with wild, implausible speculation about how our excess DNA might be beneficial to us.

So here, let's step back and look at what we expect from DNA when it's playing absolutely no functional role; in other words, let's look at the null hypothesis of genomic junk and transcriptional noise. We can then take our null hypothesis and use it to look at a fascinating new study of how genomic parasites sculpt transcription in our cells.

There is no such thing as inert DNA

What bothers me most about the recent hullaballoo over pervasive transcription is the tacit assumption that non-functional DNA has to be inert. This is a terrible assumption from both a biochemical and an evolutionary viewpoint. We have every reason to expect non-functional DNA to be noisy.

Kevin Struhl, a Harvard biologist with a long track record of pioneering studies in the field of transcription has laid out the case for transcriptional noise in a back-of-the-envelope calculation.

Inside of a yeast cell, there are about 20,000 molecules of RNA Polymerase II, the enzyme that carries out most transcription in the cell. Of these 20,000 RNA Polymerase molecules, only about 12,000 are bound to DNA; these are the molecules that are actually producing the transcription in the cell. Of these 12,000, only ~10% are necessary to produce all of the observed transcription from the ~6,000 yeast protein-coding genes (see Struhl's paper for details). In other words, up to 90% of the DNA-bound RNA polymerase is not transcribing protein-coding genes.

Now a yeast genome is very streamlined compared to ours - most of the genome consists of protein-coding genes and the immediately associated regulatory sequences. (In our genome, these regions add up to less than 5% of our total DNA). What Struhl's calculation shows is that there is a lot of potential to produce transcriptional noise. Which is exactly what we expect from the biochemistry anyway: no biological molecule is 100% specific. Struhl argues that, given the size of the genome, it's not surprising at all that only 10% of the total RNA Polymerase molecules are transcribing from the right spot on the DNA:

On the basis of the above considerations, the biological specificity, and hence the fidelity, of the Pol II initiation machinery can be estimated. There are 2,000 nucleotides or potential mRNA initiation sites at an average yeast gene (6,000 yeast genes in a 12-megabase genome). Given 10% correct initiation events, initiation from a single correct site in an average gene is 200-fold more likely than from an average position, although incorrect initiation will not occur equally at all sites... Interestingly, this specificity factor is comparable to that observed for typical sequence-specific DNA-binding proteins binding a high-affinity site, as compared with a nonspecific site. This level of fidelity is also similar to the frequency at which incorrect nucleotides or amino acids are incorporated into growing polymers of RNA and protein, although it is far below the 10-8-fold specificity seen for DNA replication, which depends on proofreading mechanisms. Thus, Pol II initiation from correct promoters seems to be rather error-prone, with a level of fidelity that is roughly comparable to those of other biological processes that are considered to be specific.

Transcriptional noise is the default assumption - and the problem is much worse in a huge mammalian genome, where the potential for an RNA polymerase to initiate from an 'incorrect' site is much, much larger. Conclusion: transcription does not equal function. With new technologies, we're getting better and better at detecting even very weak transcription, which also means we're getting better at detecting biological noise.

Genetic Drift Generates Complex Transcription Networks

From an evolutionary perspective, genomic junk and transcriptional noise is what we expect, although this lesson doesn't always sink in. A recent paper put it this way:

Almost 30 years ago, Gould and Lewontin launched a crusade against the adaptive paradigm that functional differences must be adaptive, that is, caused by natural selection. Although their arguments were controversial at the time, they have been highly influential on the community of evolutionary biologists. As a new generation of biologists with a background in genomics, molecular biology or bioinformatics has taken leadership in the field of genomic evolutionary biology, the old lessons from Gould and Lewontin seem to have been forgotten. It is a desirable addition to a story of selection to identify possible functional reasons why selection might be acting, but it will never be a method for identifying or verifying selection.

Michael Lynch, a biologist at Indiana University, has also taken genome biologists to task for seeing function everywhere. He's put together a null hypothesis of what we expect to see in the absence of natural selection:

Many physicists, engineers and computer scientists, and some cell and developmental biologists, are convinced that biological networks exhibit properties that could only be products of natural selection; however, the matter has rarely been examined in the context of well-established evolutionary principles...

Although contrarian in tone, the models presented here provide the seeds for the development of biologically realistic null hypotheses for the origins of pathway complexity, a tool that many feel is essential to the development of a rigorous theory for network evolution. If we are to be confident that the architectural features of genetic networks are advanced by natural selection, it should be possible to formally reject the possibility of neutral evolution. There is room for disagreement with what constitutes an appropriate null model for network evolution but, given that evolution is a population-level process, we start with the assumption that any such model must incorporate the fundamental principles of population genetics. Such an approach is a significant departure from most previous work in this area.

Lynch goes on to show that complex (but non-functional) regulatory interactions can evolve under genetic drift. This shouldn't be surprising to anyone with an understanding of evolution: functional features in the genome don't appear by magic; there has to be some raw material for evolution to work with. The complex structures that evolve through genetic drift can provide a rich source potential function - all it needs is a mutational tweak to make it functional and thus malleable under natural selection.

Noise and Junk in the Genome
It should be clear by now that genomic junk and transcriptional noise are the default hypotheses against which we have to compare any new claims of functional DNA. Refreshingly, a recent genome paper in Nature Genetics has a healthy perspective on this issue.

Using some of the latest DNA sequencing technology, a group of scientists looked at transcription from virus-like genomic parasites known as transposable elements. They found that hundreds of thousands of transposable elements are able to drive transcription all over the genome. Interestingly, much of this transcription takes place only in some cell types but not others: a feature of a lot of functional transcription (e.g., muscle genes get expressed in muscle cells, but not fat cells, etc.).

Before you get excited though, think about the null hypothesis! Genomic junk and transcriptional noise are perfectly capable of being restricted only to certain cell types, thus mimicking truly functional genes. The authors of this paper take a balanced view:

Rather than advocating universal utility for retrotransposons, we instead suggest that they contain active promoters and that at least some of these are functional. The high frequency of retrotransposons in mammalian genomes ensures the concurrence of many thousands of well-expressed retrotransposon promoters proximal to protein-coding genes, with widespread effects upon the regulation and evolution of those genes.

That's a great bottom line: Surely some of this is functional, but we shouldn't expect all of it to be (and function has to be demonstrated, not assumed). More importantly, functional or not, transposable elements (as this paper shows in great detail) exert a very powerful force on the overall transcription in the genome. Their enormous, dynamic presence inevitably has an impact on both functional and non-functional regions of the genome.
UPDATE: I need to add a clarification: Kevin Struhl argues that up to 90% of RNA polymerase binding is non-specific, but he's not arguing that 90% of transcription is noise. Much of that ~90% non-specific binding produces very low levels of transcription. This paper, using sequencing, found that 75% of the yeast genome is transcribed, but much of it is transcribed at low levels. The same trend is generally seen in mammalian cells.,