If genes were lights on a string of DNA, the genome would appear as an endless flicker, as thousands of genes come on and off at any given time. Tim Hughes, a Professor at the University of Toronto's Donnelly Centre, is set on figuring out the rules behind this tightly orchestrated light-show, because when it fails, disease can occur.

Genes are switched on or off by proteins called transcription factors. These proteins bind to precise sites on the DNA that serve as guideposts, telling transcription factors that their target genes are nearby.

In their latest paper, published in Nature Biotechnology, Hughes and his team did the first systematic study of the largest group of human transcription factors, called C2H2-ZF.

Despite their important roles in development and disease, these proteins have been largely unexplored because they posed a formidable challenge for researchers.

C2H2-ZF transcription factors count over 700 proteins -- around three per cent of all human genes! To make matters more complicated, most human C2H2-ZF proteins are very different from those in other organisms, like those in mice. This means that scientists could not apply insights gained from animal studies to human C2H2-ZFs.

Hughes' team found something remarkable: the reason C2H2-ZFs are so abundant and diverse -- which makes them difficult to study -- is that many of them evolved to defend our ancestral genome from damage caused by the notorious "selfish DNA."

Selfish DNA are bits of parasitic DNA whose only purpose is to multiply, a kind of virus for our genome. They seize a cell's resources to make copies of themselves, which they insert randomly across the genome -- causing harmful mutations along the way.

Almost half the human genome is made of selfish DNA, which probably came from ancient retro-viruses which, similar to modern counterparts, inserted their DNA into the host's genome. When this happens in an egg or sperm, the viral DNA gets passed on to the next generation, and the selfish DNA is then known as endogenous retro-elements (EREs).

Evolutionary biologists believe that selfish DNA was instrumental in making genomes bigger, giving natural selection additional DNA material to tinker with.

But Hughes' data suggest that EREs took centre stage in an evolutionary arms race, and that this change spawned C2H2-ZFs, a new group of proteins.

It is an enthralling tale of "conquer and enslave," one that stretches from before mammals existed to the present day.

Hughes says that C2H2-ZFs initially evolved to switch off EREs. As new EREs invaded the genome of our lizard-like ancestor, new C2H2-ZFs arose to prevent them from disrupting gene function.

This would explain how the C2H2-ZF came to be so abundant but also why they are so diverse among different organisms.

"What I think was not appreciated until this study is that retro-elements are really a driving force in the evolution of transcription factors themselves. All mammals have a whole bunch of custom transcription factors that came about to silence the EREs," says Hughes, who is also a Professor in U of T's Department of Molecular Genetics and a Senior Fellow of the Canadian Institute for Advanced Research. "But the EREs and these new transcription factors are different even for different vertebrates."

These EREs are now harmless because they are millions of years old. Over time they accumulated mutations, which pepper the genome at a constant rate, and, as a result, lost their ability to multiply and move around.

The C2H2-ZFs, on the other hand, took on new jobs.

C2H2-ZF proteins began using the EREs scattered across the genome as DNA docking sites, from which they could take control of nearby genes. The conquered EREs were finally enslaved.

Hughes describes a neat example of this process. One C2H2-ZF family member, a transcription factor called ZNF189 evolved to silence an ancient retro-element, known as LINE L2, which is a staggering 100 million years old. L2 is now inactive but ZNF189 still binds to it because it uses L2 remnants to reach other genes.

Relics of L2 sequences happen to be near genes that drive brain and heart development. And so ZNF189 could take on a new role in shaping these organs, an arrangement preserved by natural selection because it was beneficial to the embryo.

ZFN189 likely puts "breaks" on the "brain genes", similar to its ancient role with L2. But in heart cells, it may actually turn genes on because it misses the part that makes the "off switch."