DNA barcoding is a movement to catalog all life on earth by a simple standardized genetic tag, similar to stores labeling products with unique barcodes. The effort promises foolproof food inspection, improved border security, and better defenses against disease-causing insects, among many other applications.
But the approach as currently practiced churns out some results as inaccurately as a supermarket checker scanning an apple and ringing it up as an orange, according to a new Brigham Young University study.
With the International Barcode of Life project seeking $150 million to build on the 400,000 species that have been "barcoded" to date, this worthy goal warrants more careful execution, the BYU team says.
"To have that kind of data is hugely valuable, and the list of applications is endless and spans all of biology," said study co-author Keith Crandall, professor and chair of the Department of Biology at BYU. "But it all hinges on building an accurate database. Our study is a cautionary tale – if we're going to do it, let's do it right."
Proponents of DNA barcoding seek to establish a short genetic sequence as a way of identifying species in addition to traditional approaches based on external physical features. Their aim is to create a giant library full of these sequences. Scientists foresee a future handheld device like a supermarket scanner – a machine that would sequence a DNA marker from an organism, then compare it with the known encyclopedia of life and spit out the species' name.
This new approach requires only part of a sample. A feather left behind by a bird struck by an airliner, for example, would be enough to indicate its species and clue officials how to prevent future collisions. And organisms can be identified no matter what stage of life they are in – larvae of malaria-carrying mosquitoes contain the same DNA as the adult version of the insect targeted for eradication.
The portion of the gene selected as the universal marker by the barcoding movement is part of the genome found in an organism's mitochondria. But the BYU study showed the current techniques can mistakenly record instead the "broken" copy of the gene found in the nucleus of the organism's cells. This non-functional copy can be similar enough for the barcoding technique to capture, but different enough to call it a unique species, which would be a mistake. It is often difficult and time-consuming to identify this type contamination, which could lead to overestimating the number of species in a sample by more than several hundred percent, according to the BYU study.
BYU scientist Hojun Song, a post-doctoral researcher working in the laboratory of Michael Whiting, professor of biology, was preparing a paper based on his genetic analysis of grasshoppers. He noted that his sequencing turned up many of these problematic "numts" (nuclear mitochondrial pseudogenes), as scientists call these bits of inactive genetic code. When Crandall saw the unpublished paper, he recognized similar results from an analysis of cave crayfish conducted by his doctoral student, Jennifer Buhay, and recommended the two teams collaborate. The result is the PNAS paper, on which Song is the lead author and Buhay and Whiting are also co-authors, that recommends specific quality control procedures to ensure that correct genes are captured.
"I recognize that some who do DNA barcoding may be upset by this study, but that is the nature of science," Song said. "Building a genetic library of all life is a great goal, but we need to be careful to pay attention to the data that go into that library to make sure they are accurate."
Song and Crandall hope that when funding agencies hand out grants to pursue projects such as the International Barcode of Life that applicants will be required to use the procedures identified in the new paper to avoid a large portion of the numts that might otherwise be unfiltered.
Funding for the study came from two National Science Foundation Tree of Life grants – Whiting's was $700,000 and Crandall's was $1.5 million.