Genome-wide association studies are increasingly widely used to discover genetic variations that increase the risk of common diseases like heart disease and type 2 diabetes. Intuitively they're quite straightforward: take a few thousand individuals with a disease (cases), a few thousand healthy individuals (controls), examine hundreds of thousands of genetic variations in both groups using new large-scale genotyping technologies, and see which variants are more common in cases than controls. This simple approach has turned out to be a powerful tool, uncovering genes involved in a multitude of common diseases.

However, the limitations of the genetic association study are now becoming clear: once the "low-hanging fruit" - the genes with large effect - have been uncovered, each subsequent association study adds only a few new genes, typically of small effect. These limitations are well illustrated by a new article in Nature Genetics.

First, a bit of background: celiac disease is a relatively common and unpleasant auto-immune disease of the gut in which the body's immune system attacks proteins in wheat, rye and barley, leading to inflammation and intestinal damage whenever these foods are eaten. Around 1% of individuals of European ancestry suffer from this disease, and there is a strong genetic contribution to risk. In an advance online publication (full text for subscribers only) in Nature Genetics, a European consortium describes a reasonably fruitful approach to identifying novel genetic risk factors for this disease, resulting in seven new genetic regions associated with increased risk.

Members of this group had previously performed a genome-wide scan for susceptibility genes, which confirmed a previously well-known effect of variation in the HLA gene cluster (one of the usual suspects in almost all auto-immune diseases) as well as identifying a novel genetic region containing the immune system genes IL2 and IL21. However, a large fraction of the genetic risk remained unaccounted for.

One of the problems with genome-wide scans is what is known as the multiple-testing problem: when you're looking at hundreds of thousands of genetic markers at once, a disease variant has to have an incredibly strong effect to stand out from the crowd - otherwise its signal is drowned out in the statistical noise from all the other markers. In the previous genome scan the authors found thousands of markers that looked as though they might have some association with celiac disease, but weren't strong enough to be statistically significant.

That set the stage for this paper. Basically, the authors took 1,020 of those likely candidates and looked at them in a brand new set of 1,643 celiac patients and 3,406 controls. Because they were now looking at a smaller number of markers, they could in theory identify disease-associated variants with much smaller effects - and indeed they did, reaping a harvest of no fewer than seven previously unknown regions that were clearly associated with disease risk.

Reassuringly, six of the regions identified contain genes known to be involved in immune function, and the authors have plausible mechanistic explanations for two of the associated markers: one alters the sequence of a protein in a way that is likely to alter its function, and another variant has a significant effect on the levels of expression of a nearby gene.

However, while the markers associated with celiac disease were common (with a frequency of more than 10%) they each had a very subtle effect on disease risk, raising the odds of suffering from the disease by only 19-34%. In total these variants only explain 3-4% of the total genetic risk for this disease; even when you add that to the ~35% which is explained by variation in the HLA region, the majority of the genetic disease risk remains missing in action.

It's clear from this study just how far we still have to go to define the genetic basis of complex conditions like celiac disease. Simply adding more and more numbers to these traditional types of association studies is likely to have diminishing returns - much of the heritable risk comes from areas that simply aren't explored by standard genome-scan approaches:

  1. rare variants of moderate to large effect, which are completely invisible to traditional genome scans that rely on common markers;
  2. other types of genetic variation that can't be seen clearly by existing chips, such as copy number differences; and
  3. non-genetic heritable factors, such as epigenetic modifications.

Identification of these components of disease risk will require new approaches: large-scale sequencing for rare variants, targeted scans for copy number variation, and brand new applications to identify epigenetic changes. We'll see these approaches becoming more and more common as the technology evolves over the next few years.


Hunt, K.A., Zhernakova, A., Turner, G., Heap, G.A., Franke, L., Bruinenberg, M., Romanos, J., Dinesen, L.C., Ryan, A.W., Panesar, D., Gwilliam, R., Takeuchi, F., McLaren, W.M., Holmes, G.K., Howdle, P.D., Walters, J.R., Sanders, D.S., Playford, R.J., Trynka, G., Mulder, C.J., Mearin, M.L., Verbeek, W.H., Trimble, V., Stevens, F.M., O'Morain, C., Kennedy, N.P., Kelleher, D., Pennington, D.J., Strachan, D.P., McArdle, W.L., Mein, C.A., Wapenaar, M.C., Deloukas, P., McGinnis, R., McManus, R., Wijmenga, C., van Heel, D.A. (2008). Newly identified genetic risk variants for celiac disease related to the immune response. Nature Genetics DOI: 10.1038/ng.102