Banner
    Non-coding DNA Function... Surprising?
    By Michael White | February 10th 2011 03:51 PM | 5 comments | Print | E-mail | Track Comments
    About Michael

    Welcome to Adaptive Complexity, where I write about genomics, systems biology, evolution, and the connection between science and literature,

    ...

    View Michael's Profile
    The existence of functional, non-protein-coding DNA is all too frequently portrayed as a great surprise uncovered by genome sequencing projects, both in large media outlets and in scientific publications that should have better quality control in place. Eric Lander, writing a Human Genome Project 10th anniversary retrospective in Nature, explains the real surprise about non-coding DNA that was revealed by big omics projects. Despite ravings about the newly identified mysteries of the 'dark genome', it remains a fact that functional, non-protein-coding DNA has been known for more than half a century, well before such interesting things as micro RNAs, ribozymes, and long ncRNA were discovered. The diversity of functional (and dubiously functional) RNAs has been genuinely interesting, but, in my humble opinion, not nearly as surprising as the discovery made about the relatively small slice of the human genome that shows strong evolutionary conservation (and is therefore most likely to be functional). Lander writes:
    The most surprising discovery about the human genome was that the majority of the functional sequence does not encode proteins. These features had been missed by decades of molecular biology, because scientists had no clue where to look.* Comparison of the human and mouse genomes showed a substantial excess of conserved sequence, relative to the neutral rate in ancestral repeat elements4. The excess implied that at least 6% of the human genome was under purifying selection over the past 100 million years and thus biologically functional. Protein-coding sequences, which comprise only ~1.5% of the genome, are thus dwarfed by functional conserved non-coding elements (CNEs). Subsequent comparison with the rat and dog genomes confirmed these findings.
    In other words, of the conserved (and most likely functional) portion of the genome, the ratio of non-coding to protein-coding DNA is 3:1. That was mind-blowing. The other, non-conserved ~94% of the genome does contain some interesting features, but most of the real non-coding action is limited to a small 4% slice. *This particular sentence is flat-out wrong. It can most easily be refuted by a paper such as this one (note the pre-human genome project date): "An albumin enhancer located 10 kb upstream functions along with its promoter to direct efficient, liver-specific expression in transgenic mice." Pinkert CA, Ornitz DM, Brinster RL, Palmiter RD. Laboratory of Reproductive Physiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, 19104. Genes Dev. 1987 May;1(3):268-76. Abstract: Transgenic mice were used to locate the cis-acting DNA elements that are important for efficient, tissue-specific expression of the mouse albumin gene in the adult. Chimeric genes with up to 12 kb of mouse albumin 5'-flanking region fused to a human growth hormone (hGH) reporter gene were tested. Remarkably, a region located 8.5-10.4 kb upstream of the albumin promoter was essential for high-level expression in adult liver and the region in between -8.5 and -0.3 kb was dispensable. The far-upstream region behaved like an enhancer in that its position and orientation relative to the albumin promoter were not critical; however, it did not function well with a heterologous promoter. Two of four DNase hypersensitive sites found in the 5'-flanking region of the albumin gene map to the far-upstream and promoter regions; the others may reflect regions involved in developmental or environmental control of this gene.

    Comments

    vongehr
    Maybe you could explain some more about the whole issue of non-coding. I read for example that much of animals genome is coming from relatively recent viral (e.g. ERV) sources and only a small part is e.g. 'human' and coding and functional. Well, if the viral code is not making proteins, what is it doing? It cannot possibly be that it has no function yet is selected for to stay around. It seems that all those texts about these issues that do not go over my head completely confuse non-protein-coding, non-human, and non-function. Could you shine some light here? As far as I understand neo-Darwinism, there is no 'junk' except for junk that somehow has at least functionality for its own survival. From that point of view, none of the findings are surprising.
    adaptivecomplexity
    Non-coding always refers to non-protein coding, regardless of whether the DNA is regulatory sequence or 'viral'.  Most of the human genome (and vertebrate genomes in general) consists of DNA related to viruses - 'transposable elements', which are not actually viruses that spread from person to person, but which are related to viruses (they have similar DNA sequences). Under certain conditions, transposable elements can 'hop' around the genome - e.g., make multiple copies of themselves in the genome.  (And some of this DNA is in fact protein-coding - encoding the proteins necessary for the transposable elements to spread themselves.)
    This stuff is actually not that recent - families of these transposable elements have been hopping in our ancestors' genomes for hundreds of millions of years.

    'Junk' DNA certainly is possible within a Darwinian paradigm, for two reasons: 

    1) Transposable elements are, in a sense, like genomic parasites that have figured out how to get themselves passed on to the next generation. Most copies of transposable elements are dead elements - they have no capability to replicate themselves, and these are slowly eroded away by generations of mutation. But others, by chance and sheer numbers, remain 'live'.

    2) Unless 'junk' DNA causes active harm, there is not necessarily selection against it, and thus there is not often a mechanism to remove it. It can definitely stick around without a function.

    Most of this transposable element DNA simply exists for its own sake -there is no other explanation required for its existence.  However, it can, on occasion, provide a rich source of new function in the genome, by doing things like moving non-coding regulatory sequence around the genome and generating new linkages within genetic networks.

    Mike
    vongehr
    "parasites that have figured out how to get themselves passed on to the next generation"
    Well, that is what life is all about. But over time, only those parasites somewhat useful to the host will stay around. Given that there are species with much less 'junk' and so on, I doubt that the junk is junk. OK, some of it is junk of course, nothing is perfect, but the large chunk of junk being there as such probably is good for something. You writing "it can, on occasion, provide a rich source of new function in the genome" hints at some sort of use that comes in if the environment changes rapidly.
    With the fast adapting water dweller that has recently been found to have the largest genome of all creatures on one hand and some species that have relatively little junk on the other, should there not be a good reason for how the fraction of 'junk' is selected for? If that is known, one can hardly further hold the view that the 'junk' is really junk (just like junk yards are there for a good reason - junk yards are not junk).
    adaptivecomplexity
    But over time, only those parasites somewhat useful to the host will stay around.
    That is true in terms of transposable element sequence - most transposable elements are not conserved; their sequence is being scrambled by mutation, and what you have left is the detritus. But without an active mechanism to actually get rid of this DNA, which in itself is not harmful, it will stick around in the genome.
    With the fast adapting water dweller that has recently been found to have the largest genome of all creatures on one hand and some species that have relatively little junk on the other, should there not be a good reason for how the fraction of 'junk' is selected for? If that is known, one can hardly further hold the view that the 'junk' is really junk (just like junk yards are there for a good reason - junk yards are not junk).
    Keep in mind that the bulk of this non-coding DNA shows no evidence of selection. The conclusion to draw from variation in genome size is just the opposite of the conclusion you draw. I quote Ryan Gregory's Onion test:
    The onion test is a simple reality check for anyone who thinks they have come up with a universal function for non-coding DNA. Whatever your proposed function, ask yourself this question: Can I explain why an onion needs about five times more non-coding DNA for this function than a human? The onion, Allium cepa, is a diploid (2n = 16) plant with a haploid genome size of about 17 pg. Human, Homo sapiens, is a diploid (2n = 46) animal with a haploid genome size of about 3.5 pg. This comparison is chosen more or less arbitrarily (there are far bigger genomes than onion, and far smaller ones than human), but it makes the problem of universal function for non-coding DNA clear. Further, if you think perhaps onions are somehow special, consider that members of the genus Allium range in genome size from 7 pg to 31.5 pg. So why can A. altyncolicum make do with one fifth as much regulation, structural maintenance, protection against mutagens, or [insert preferred universal function] as A. ursinum?
    Mike
    vongehr

    Lets be clear about terms: ‘Junk’ = non-(protein)coding DNA = maybe 5% functional somehow (regulatory???) and 95% random drift neutrally mutated non-harmful Real Junk (RJ) we may as well cut out.

    RJ has a certain size S in each species, for example Sonion allium cepa = 5 Shuman where Shuman = 2/3 Stotal human genome.

    You say that S and/or the fraction S/Sgenome itself is not selected for [= no function and no harm except for some obvious upper limits like the volume of the nucleus]. Moreover, these factors vary up to a factor of 5 or more even between different onions (all wild?), so there is no argument like that onions may need to adapt faster than humans.

    If that is so, then do you agree that we could put our heads together and come up with a simulation/model where fairly simple statistical assumptions about how the size S fluctuates (LN distribution, Gaussian, …), that would via at most two or three adjustable parameters (mean, standard deviation, kurtosis) in one species or genus or family or some such X, reliably predict the variation of S in other X'? (Has it been done?)

    The more complex actual distributions look like, the more it would be justified to speak of selection pressure. If there are for example clearly bimodal distributions with peaks correlating with an environmental parameter, S and thus RJ is functional.