I first became interested in genome size because of its tie-ins with important evolutionary questions in which I was (and still am) interested, such as punctuated vs. gradual patterns, levels of selection, and adaptive vs. non-adaptive processes. What I didn't realize was that one component of the question, the quantity of DNA that is non-functional (but not necessarily inconsequential) with regard to the phenotype of the organism, is such a hot-button issue. I had vague inklings at first that young-earth creationists would object to the idea of non-functional DNA -- because God, as they say, don't make no junk. (Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated). And of course there has always been a persistent undertone in biology that non-coding DNA must be doing something or it would have been deleted. This latter view, which derives directly from a hardcore adaptationist approach, destroys the argument by creationists that "Darwinism" has prevented researchers from considering functions for non-coding DNA. Indeed, the main motivation for the early papers on "selfish DNA" was to counter this adaptationist assumption (Doolittle and Sapienza 1980).

Creationist nonsense about DNA does not surprise me. What has intrigued me much more is the debate among biologists about this, and the rather questionable claims, suppositions, and extrapolations that get made not just by the media but by various scientists themselves.

Take Francis Collins. He's a major player in genome biology and led the charge by the public Human Genome Project. And yet, he makes claims that non-coding DNA may be present in the genome "just in case" it needs to be put to use in the future. This makes no sense from an evolutionary perspective. It would be tempting to attribute this to Collins's adherence to the notion of theistic evolution, but in fact one can find this sort of fuzzy foresight argument being brought up by lots of authors. I suppose it's just disappointing that there is not better communication between genome biology and evolutionary biology.

The case that frustrates me most is that of John Mattick. He of the worst figure ever is one of the primary promulgators of the view that scientists have overlooked possible function for non-coding DNA and that this is "one of the biggest mistakes in the history of molecular biology" that can only be corrected by a "new paradigm", and so on. Basically, the argument seems to be that much of the non-coding portion of a given genome is involved in regulation and such. In the past, Mattick has refrained from pinning down an estimate of how much non-coding DNA he believes is functional, but his presentation of (extremely selective) data left little doubt that he considers more non-coding DNA to be correlated with greater complexity. But now we're starting to get some more explicit and increasingly bold claims.

As Check (2007) pointed out in a news article in Nature,

Mattick thinks scientists are vastly underestimating how much of the genome is functional. He and Birney have placed a bet on the question. Mattick thinks at least 20% of possible functional elements in our genome will eventually be proven useful. Birney thinks fewer are functional.
Now consider this quote by Comings (1972), who was the first person to use the term "junk DNA" extensively (even before Ohno's (1972) coinage appeared in print):
These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn't mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

So, even if Mattick is right about 20% of the human genome being functional, which is considered a rather high estimate on the basis of available data, he still would be merely agreeing with the author of the first major discussion about junk DNA.

Now, I should point out that I do not have a vested interest in how much of the human genome is functional. 5%? Fine. 20%? Fine. 50%? Ok. I will go where the data indicate. My reason for rejecting the notion of "more complexity means more DNA" is comparative: I refer you to the "onion test" for a simple illustration. However, as readers of my Genomicron blog already know, I find it rather irksome when people take any new finding about (potential) function in some part of the human genome and extrapolate this to mean that all DNA in every genome must be serving some role.

Anyway, back to what Mattick suggests. As noted, for the most part he has gone about arguing for large-scale function more by hint than by direct claim. However, finally he says the following (Phaesant and Mattick 2007).

Thus, although admittedly on the basis of as yet limited evidence, it is quite plausible that many, if not the majority, of the expressed transcripts are functional and that a major component of genomic information is rapidly evolving regulatory DNA and RNA. Consequently,it is possible that much if not most of the human genome may be functional. This possibility cannot be ruled out on the available evidence, either from conservation analysis or from genetic studies, but does challenge current conceptions of the extent of functionality of the human genome and the nature of the genetic programming of humans and other complex organisms. [Emphasis added]

It seems to me that "we can't rule this out" is not a reason to think that something is plausible, let alone true. In fact, the existence of mechanisms such as transposable element spread and the pseudogenization of duplicate genes suggests that there is good reason to expect much (probably most) of the genome to be non-functional unless data show otherwise. Some TEs have taken on a function, some cause disease, some are merely benign or only slightly detrimental. The proportions of non-coding elements in each of these categories remain to be determined, but they are not all equally likely by default.

The question of which sequences are functional, and in what way, is one of the more contentious and therefore interesting ones in genome biology. On the one hand, new information from various sources including the ENCODE project indicates that much non-coding is transcribed, though it remains an open question whether this has to do with function or noise. On the other hand, a recent analysis has suggested that as many as 4,000 sequences within the human genome initially thought to be genes are not really genes after all (Clamp et al. 2007), bringing the total count down to around 20,000.

Some people, mostly creationists and strict adaptationists (strange bedfellows, I agree) desperately want the vast non-coding majority of eukaryote DNA to have a function. They latch onto any new discovery of function in some segment of the genome or another (or indeed, any mere restatement of what many authors have been saying since the 1970s) and consider their position supported. The rest of us will just have to wait and see.



Check, E. (2007). Genome project turns up evolutionary surprises. Nature 447: 760-761.

Clamp, M., B. Fry, M. Kamal, X. Xie, J. Cuff, M.F. Lin, M. Kellis, K. Lindblad-Toh, and E.S. Lander (2007). Distinguishing protein-coding and noncoding genes in the human genome. Proceedings of the National Academy of Sciences USA 104: 19428-19433.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Ohno, S. 1972. So much "junk" DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Phaesant, M. and J.S. Mattick (2007). Raising the estimate of functional human sequences. Genome Research 17: 1245-1253.