Snipe hunting in the genome
    By Michael White | April 20th 2009 04:34 PM | 12 comments | Print | E-mail | Track Comments
    About Michael

    Welcome to Adaptive Complexity, where I write about genomics, systems biology, evolution, and the connection between science and literature,


    View Michael's Profile
    Sigh - I was going to recommend this piece about recent human genome research in Scientific American, by a leading researcher in comparative genomics, Katherine Pollard, until I came to the last paragraph:
    Experimental and computational studies now under way in thousands of labs around the world promise to elucidate what is going on in the 98.5 percent of our genome that does not code for proteins. It is looking less and less like junk every day.
    Anyone, especially a genome scientist, who implies that most of our genome is packed full of functional sequences should back that up with some specifics, starting with answers to these two questions:
    What makes you think that most of the 1 million alu elements in the genome are functional?

    Ryan Gregory's onion test: Why does an onion need five times more non-coding DNA than a human? In fact, why does one species in the onion genus need almost five times as much non-coding DNA as a closely related species?

    Just to reiterate: of course there are more functional non-coding sequences in the human genome waiting to be identified. But those who set out to find function in all or most of the genome are engaging in a giant, molecular snipe hunt.


    Do you think it's an issue or an attempt at clever wording gone awry?

    It got 331 votes on digg, which tells you all you need to know about whether or not digg readers read articles.
    Well, most of the piece is fine - Pollard has done some significant work in the field and describes these discoveries well in the article.
    It could be clever wording gone awry, but I've heard others, primarily computational people, make similar statements. I don't know what the issue is - it could be that a lot of people who do computational analysis of the genome come into the field with more of a computer science and less of a biology outlook, and they sometimes lose track of the bigger biological picture.
    I don't know, it sems pretty ambiguously worded to me. Surely not all of it is functional, but likely some (most?) is there for some reason. The answer to "What's going on?" could just be "nothing" for a lot of it. I don't necessarily take it as her implying that she thinks all of it is actually functional, just that people are doing lots of work to figure it all out.

    but likely some (most?) is there for some reason
    Some is, but most is implausible.  The reason most of it is there is because things like LINEs and SINEs and other repetitive elements that make up the bulk of our genomes, can expand, by known mechanisms, without any need for functional selection.

    So the default hypothesis is that any given Alu element, for example, is non-functional. Given what we know about how these things expand, and the widely varying amount of this DNA in closely related species (as Ryan points out in his onion test), there is a very strong burden of proof on those who want to suggest that most of this stuff is functional.

    The wording in the article bothers me, because she's suggesting that "the other 98.5%" is "looking less and less like junk every day." Most of it still looks like junk.
    The hypothesis that non-coding region is mostly junk need to be proven just like the alternative hypothesis that they are mostly not junk. Are there any studies that prove/show the former? Otherwise, it is still an open question and it is still early to make conclusion given that we may not know what we do not know.

    The hypothesis that a DNA sequence is non-functional is the null hypothesis.  It is what we expect if there is nothing but randomness going on.  Technically, you cannot prove a negative.  So, you will be waiting a long time for that proof.
    I guess null hypothesis applied in statistical hypothesis testing but I am not sure it applied in this case for biological hypothesis. It is theoretically possible (though not technically) to prove whether non-coding region is mostly non-functional - by removing them from genome and assess phenotypic effect.

    Sounds simple, right?  But when I remove that chunk of DNA what do I replace it with?  Do I just put the ends back together?  Do I replace it with another piece of DNA?  How do I know that I'm not creating a new functional location by either action?  I could remove the DNA and put it in front of a reporter, but then I don't know if I have the context for the DNA function correct.

    It is very useful to have an expectation of what happens when there is no phenomenon.  Because the genome is so large, complex, and context matters so much, the null hypothesis is so useful in genomics.
    There is another straw man lurking in this argument...the idea that all non-coding DNA is junk has been debunked for a long time now.
    Justin is right - Pollard's wording suggests that functional noncoding DNA is a new idea. It's not. Scientists knew of functional noncoding DNA before the term junk DNA was ever coined, and the term was never meant to encompass all noncoding DNA. But as Josh says, the default explanation for most of this stuff is that it's non-functional. Why? Because we already know that things like LINES can expand in the genome without serving any functional purpose. We have a well-verified explanation for why they exist. When I walk into a friend's house and see cobwebs, my default assumption is not that he's cultivating spiders - it's that he's neglected to remove them.
    Gerhard Adam
    Might it not be possible that it is non-functional but useful just the same?  In other words, without considering the possibility of any other kind of mechanism, doesn't the existence of a large quantity of non-functional DNA actually increase the likelihood that mutations would occur in non-coding parts versus coding portions?

    In other words, if the DNA were 100% functional, then any copy error would automatically require error correction and it would seem that a much higher mutation rate would occur.  With 98+% of the DNA being non-coding, then there is a substantially higher probability that coding errors would occur in sections that are essentially non-functional and therefore place less "strain" on the error correcting capabilities of the molecules.

    While having a longer length would increase the number of errors, so to would it reduce the probabilities of errors occurring in more critical parts.  So I'm suggesting that while non-functional DNA may not actually perform any direct service, it may indirectly benefit the DNA molecule by reducing the likelihood of serious errors.

    One could almost postulate the possibility of a "junk" collecting function so that the more non-coding DNA there is, the more stable the DNA and consequently the less prone to mutation the organism is.  The less "junk", the greater the mutation rate (possibly "encouraging" more mutations and more rapid selection options).

    This could also pass the "onion test" since the only significance to possessing a larger genome would be in reducing the impact of random mutations rather than doing anything to account for complexity.  In that instance, it may be reasonable to consider why there is such a large variation between humans and the onion or a salamander.  (My point is that .... perhaps this is one reason why some animals can retain their essential characteristics over millions of years despite the impact of mutations along the way).
    Mundus vult decipi
    Might it not be possible that it is non-functional but useful just the same?  In other words, without considering the possibility of any other kind of mechanism, doesn't the existence of a large quantity of non-functional DNA actually increase the likelihood that mutations would occur in non-coding parts versus coding portions?
    You raise a good point - which is part of the subject of one of my next posts, hopefully going up today - we should have this conversation there.
    But a few brief comments:

    There are several ways to think about this: scenario 1: large swaths of non-functional DNA have been preserved by natural selection as either a mutation buffer or source of new function. There is really no evidence to support this perspective, and I think it's implausible in the first place, BUT...

    Scenario 2 is that this stuff accumulates because it's hard to eliminate (i.e., there is no selective pressure to do so), yet because it's there it serves as a mutational buffer and/or source of new function. This is the more likely scenario. As Don Hucks pointed out, transposable elements have clearly served, on many occasions, as sources of new function.