When people began estimating genome sizes (amounts of DNA per genome) in the late 1940s and early 1950s, they noticed that this is largely a constant trait within organisms and species. In other words, if you look at nuclei in different tissues within an organism or in different organisms from the same species, the amount of DNA per chromosome set is constant. (There are some interesting exceptions to this, but they were not really known at the time). This observed constancy in DNA amount was taken as evidence that DNA, rather than proteins, is the substance of inheritance.
These early researchers also noted that some "less complex" organisms (e.g., salamanders) possess far more DNA in their nuclei than "more complex" ones (e.g., mammals). This rendered the issue quite complex, because on the one hand DNA was thought to be constant because it's what genes are made of, and yet the amount of DNA ("C-value", for "constant") did not correspond to assumptions about how many genes an organism should have. This (apparently) self-contradictory set of findings became known as the "C-value paradox" in 1971.
This "paradox" was solved with the discovery of non-coding DNA. Because most DNA in eukaryotes does not encode a protein, there is no longer a reason to expect C-value and gene number to be related. Not surprisingly, there was speculation about what role the "extra" DNA might be playing.
In 1972, Susumu Ohno coined the term "junk DNA". The idea did not come from throwing his hands up and saying "we don't know what it does so let's just assume it is useless and call it junk". He developed the idea based on knowledge about a mechanism by which non-coding DNA accumulates: the duplication and inactivation of genes. "Junk DNA," as formulated by Ohno, referred to what we now call pseudogenes, which are non-functional from a protein-coding standpoint by definition. Nevertheless, a long list of possible functions for non-coding DNA continued to be proposed in the scientific literature.
In 1979, Gould and Lewontin published their classic "spandrels" paper (Proc. R. Soc. Lond. B 205: 581-598) in which they railed against the apparent tendency of biologists to attribute function to every feature of organisms. In the same vein, Doolittle and Sapienza published a paper in 1980 entitled "Selfish genes, the phenotype paradigm and genome evolution" (Nature 284: 601-603). In it, they argued that there was far too much emphasis on function at the organism level in explanations for the presence of so much non-coding DNA. Instead, they argued, self-replicating sequences (transposable elements) may be there simply because they are good at being there, independent of effects (let alone functions) at the organism level. Many biologists took their point seriously and began thinking about selection at two levels, within the genome and on organismal phenotypes. Meanwhile, functions for non-coding DNA continued to be postulated by other authors.
As the tools of molecular genetics grew increasingly powerful, there was a shift toward close examinations of protein-coding genes in some circles, and something of a divide emerged between researchers interested in particular sequences and others focusing on genome size and other large-scale features. This became apparent when technological advances allowed thoughts of sequencing the entire human genome: a question asked in all seriousness was whether the project should bother with the "junk".
Of course, there is now a much greater link between genome sequencing and genome size research. For one, you need to know how much DNA is there just to get funding. More importantly, sequence analysis is shedding light on the types of non-coding DNA responsible for the differences in genome size, and non-coding DNA is proving to be at least as interesting as the genic portions.
- The term "junk DNA" was not coined on the basis of not knowing what it does. It was not a cop-out or a surrender. Susumu Ohno coined the term in 1972 in reference to a specific mechanism of non-coding DNA formation that he thought accounted for the discrepancies in genome size among species: gene duplication and pseudogenization. That is, a gene is duplicated and one of the copies becomes degraded by mutation to the point of being non-functional with regard to protein coding. (Sometimes the second copy takes on a new function through "neofunctionalization", or the two copies may split the original function through "subfunctionalization"). "Junk" meant something that was functional (a gene) but now isn't (a pseudogene).
- Since the first discussions about DNA amount there have been scientists who argued that most non-coding DNA is functional, others who focused on mechanisms that could lead to more DNA in the absence of function, and yet others who took a position somewhere in the middle. This is still the situation now.
- Lots of mechanisms are known that can increase the amount of DNA in a genome: gene duplication and pseudogenization, duplicative transposition, replication slippage, unequal crossing-over, aneuploidy, and polyploidy. By themselves, these could lead to increases in DNA content independent of benefits for the organism, or even despite small detrimental impacts, which is why non-function is a reasonable null hypothesis.
- Evidence currently available suggests that about 5% of the human genome is functional. The least conservative guesses put the possible total at about 20%. The human genome is mid-sized for an animal, which means that most likely a smaller percentage than this is functional in other genomes. None of the discoveries suggest that all (or even more than a minor percentage) of non-coding DNA is functional, and the corollary is that there is indirect evidence that most of it is not.
- Identification of function is done by evolutionary biologists and genome researchers using an explicit evolutionary framework. One of the best indications of function that we have for non-coding DNA is to find parts of it conserved among species. This suggests that changes to the sequence have been selected against over long stretches of time because those regions play a significant role. Obviously you can not talk about evolutionarily conserved DNA without evolutionary change.
- Examples of transposable elements acquiring function represent co-option. This is the same phenomenon that is involved in the evolution of complex features like eyes and flagella. In particular, co-option of TEs appears to have happened in the evolution of the vertebrate immune system. Again, this makes no sense in the absence of an evolutionary scenario.
- Most transposable elements do not appear to be functional at the organism level. In humans, most are inactive molecular fossils. Some are active, however, and can cause all manner of diseases through their insertions. To repeat: some transposons are functional, some are clearly deleterious, and most probably remain more or less neutral.
- Any suggestions that all non-coding DNA is functional must explain why an onion needs five times more of it than you do. So far, none of the proposed unilateral functions has done this. It therefore remains most reasonable to take a pluralistic approach in which only some non-coding elements are functional for organisms.
You can tell someone who knows very little about the science or history of "junk DNA" when they make one or more of the following claims: 1) All scientists have always thought it was all totally irrelevant to the organism. 2) New evidence is suggesting that it is all functional. 3) "Darwinism" led to the assumption that non-coding DNA is non-functional. The opposite is true in each case.
- A word about "junk DNA".
- The onion test.
- An opportunity for ID to be scientific.
- Human gene number: surprising (at first) but not paradoxical.
- Ultraconserved non-coding regions must be functional... right?
- Gene number and complexity.
- What's wrong with this figure?
- Dog's Ass Plots (DAPs).
- Genome size and gene number.