Markers can aid selection for target alleles that are not easily assayed in individual plants, minimize linkage drag around the target gene, and reduce the number of generations required to recover a very high percentage of the recurrent parent genetic background. Improvements in marker detection systems and in the techniques used to identify markers linked to useful traits, has enabled great advances to be made in recent years.
Though restriction fragments length polymorphism (RFLP) markers have been the basis for most of the work in crop plants, valuable markers have been generated from random amplification polymorphic DNA (RAPD) and amplified fragments length polymorphism (AFLP). Simple sequence repeats (SSR) or microsatellite markers have been developed more recently for major crop plants and this marker system is predicted to lead to even more rapid advances in both marker development and implementation in breeding programs.
Identification of the markers linked to useful traits has been based on complete linkage maps and bulked segregant analysis. However, alternative methods, such as the construction of partial maps and combination of pedigree and marker information, have also proved useful in identifying marker/trait associations. A revision of current breeding methods by utilizing molecular markers in breeding programs is, therefore, crucial in the present scenario.
A genetic marker is a gene or DNA sequence with a known location on a chromosome and associated with a particular gene or trait. It can be described as a variation, which may arise due to mutation or alteration in the genomic loci, that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like minisatellites.
For many years, gene mapping was limited in most organisms by traditional genetic markers which include genes that encode easily observable characteristics such as blood types or seed shapes. The insufficient amount of these types of characteristics in several organisms limited the mapping efforts that could be done.
Some commonly used types of genetic markers are
• RFLP (or Restriction fragment length polymorphism)
• AFLP (or Amplified fragment length polymorphism
• RAPD (or Random amplification of polymorphic DNA)
• VNTR (or Variable number tandem repeat)
• Microsatellite polymorphism
• SNP (or Single nucleotide polymorphism)
• STR (or Short tandem repeat)
• SFP (or Single feature polymorphism)
• DArT (or Diversity Arrays Technology)
They can be further categorized as dominant or co-dominant. Dominant markers allow for analyzing many loci at one time, e.g. RAPD. A primer amplifying a dominant marker could amplify at many loci in one sample of DNA with one PCR reaction. Co-dominant markers analyze one locus at a time. A primer amplifying a co-dominant marker would yield one targeted product.
1. Restriction fragment length polymorphism
In molecular biology, the term restriction fragment length polymorphism, or RFLP, (commonly pronounced “rif-lip”) refers to a difference between two or more samples of homologous DNA molecules arising from differing locations of restriction sites, and to a related laboratory technique by which these segments can be distinguished. In RFLP analysis the DNA sample is broken into pieces (digested) by restriction enzymes and the resulting restriction fragments are separated according to their lengths by gel electrophoresis. Although now largely obsolete, RFLP analysis was the first DNA profiling technique cheap enough to see widespread application. In addition to genetic fingerprinting, RFLP was an important tool in genome mapping, localization of genes for genetic disorders, determination of risk for disease, and paternity testing.
Amplified fragment length polymorphism
Amplified Fragment Length Polymorphism PCR (or AFLP-PCR or just AFLP) is a PCR-based tool used in genetics research, DNA fingerprinting, and in the practice of genetic engineering. Developed in the early 1990’s by Keygene, AFLP uses restriction enzymes to cut genomic DNA, followed by ligation of adaptors to the sticky ends of the restriction fragments. A subset of the restriction fragments are then amplified using primers complementary to the adaptor and part of the restriction site fragments (as described in detail below). The amplified fragments are visualized on denaturing polyacrylamide gels either through autoradiography or fluorescence methodologies.
AFLP-PCR is a highly sensitive method for detecting polymorphisms in DNA. The technique was originally described by Vos and Zabeau in 1993. In detail, the procedure of this technique is divided into three steps: 
1. Digestion of total cellular DNA with one or more restriction enzymes and ligation of restriction half-site specific adaptors to all restriction fragments.
2. Selective amplification of some of these fragments with two PCR primers that have corresponding adaptor and restriction site specific sequences.
3. Electrophoretic separation of amplicons on a gel matrix, followed by visualisation of the band pattern.
A variation on AFLP is cDNA-AFLP, which is used to quantify differences in gene expression levels.
Applications of AFLP:
The AFLP technology has the capability to detect various polymorphisms in different genomic regions simultaneously. It is also highly sensitive and reproducible. As a result, AFLP has become widely used for the identification of genetic variation in strains or closely related species of plants, fungi, animals, and bacteria. The AFLP technology has been used in criminal and paternity tests, in population genetics to determine slight differences within populations, and in linkage studies to generate maps for quantitative trait locus (QTL) analysis.
There are many advantages to AFLP when compared to other marker technologies including randomly amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), and microsatellites. AFLP not only has higher reproducibility, resolution, and sensitivity at the whole genome level compared to other techniques, but it also has the capability to amplify between 50 and 100 fragments at one time. In addition, no prior sequence information is needed for amplification (Meudt&Clarke 2007). As a result, AFLP has become extremely beneficial in the study of taxa including bacteria, fungi, and plants, where much is still unknown about the genomic makeup of various organisms.
RAPD (pronounced "rapid") stands for Random Amplification of Polymorphic DNA. It is a type of PCR reaction, but the segments of DNA that are amplified are random. The scientist performing RAPD creates several arbitrary, short primers (8-12 nucleotides), then proceeds with the PCR using a large template of genomic DNA, hoping that fragments will amplify. By resolving the resulting patterns, a semi-unique profile can be gleaned from a RAPD reaction.
No knowledge of the DNA sequence for the targeted gene is required, as the primers will bind somewhere in the sequence, but it is not certain exactly where. This makes the method popular for comparing the DNA of biological systems that have not had the attention of the scientific community, or in a system in which relatively few DNA sequences are compared (it is not suitable for forming a DNA databank). Because it relies on a large, intact DNA template sequence, it has some limitations in the use of degraded DNA samples. Its resolving power is much lower than targeted, species specific DNA comparison methods, such as short tandem repeats. In recent years, RAPD has been used to characterize, and trace, the phylogeny of diverse plant and animal species.
1. KeyGene Quantar Suite Versatile marker scoring software
2. Zabeau, M and P. Vos. 1993. Selective restriction fragment amplification: a general method for DNA fingerprinting. European Patent Office, publication 0 534 858 A1, bulletin 93/13.
3. Vos P, Hogers R, Bleeker M, et al. (November 1995). "AFLP: a new technique for DNA fingerprinting". Nucleic Acids Res. 23 (21): 4407–14. PMID 7501463. PMC 307397. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=7501463.
First you need to know a few key terms:
As you go through the subsequent discussion, you may need to jump back here to refresh your memory on various definitions.
We assume you've read through the description of DNA structure, an earlier link in this thread ... right?
A 'plasmid' is a small, circular piece of DNA that is often found in bacteria. This innocuous molecule might help the bacteria survive in the presence of an antibiotic, for example, due to the genes it carries. To scientists, however, plasmids are important because (i) we can isolate them in large quantities, (ii) we can cut and splice them, adding whatever DNA we choose, (iii) we can put them back into bacteria, where they'll replicate along with the bacteria's own DNA, and (iv) we can isolate them again - getting billions of copies of whatever DNA we inserted into the plasmid! Plasmid are limited to sizes of 2.5-20 kilobases (kb), in general.
The term 'BAC" is an acronym for 'Bacterial Artificial Chromosome', and in principle, it is used like a plasmid. We construct BACs that carry DNA from humans or mice or wherever, and we insert the BAC into a host bacterium. As with the plasmid, when we grow that bacterium, we replicate the BAC as well. Huge pieces of DNA can be easily replicated using BACs - usually on the order of 100-400 kilobases (kb). Using BACs, scientists have cloned (replicated) major chunks of human DNA. This, as you will see later, is critical to the Human Genome Project.
The 'vector' is generally the basic type of DNA molecule used to replicate your DNA, like a plasmid or a BAC.
The 'insert' is a piece of DNA we've purposely put into another (a 'vector') so that we can replicate it. Usually the 'insert' is the interesting part, consequently. In the case of the Human Genome Project or other sequencing projects, the insert is the part we want to sequence - the part we don't know. Usually we know the complete DNA sequence of the vector.
Shotgun sequencing is a method for determining the sequence fo a very large piece of DNA. The basic DNA sequencing reaction can only get the sequence of a few hundred nucleotides. For larger ones (like BAC DNA), we usually fragment the DNA and insert the resultant pieces into a convenient vector (a plasmid, usually) to replicate them. After we sequence the fragments, we try to deduce from them the sequence of the original BAC DNA.
Shotgun sequencing: assembly of random sequence fragments
To sequence a BAC, we take millions of copies of it and chop them all up randomly. We then insert those into plasmids and for each one we get, we grow lots of it in bacteria and sequence the insert. If we do this to enough fragments, eventually we'll be able to reconstruct the sequence of the original BAC based on the overlapping fragments we've sequenced!
What is a primer?
A primer is a short synthetic oligonucleotide which is used in many molecular techniques from PCR to DNA sequencing. These primers are designed to have a sequence which is the reverse complement of a region of template or target DNA to which we wish the primer to anneal.
DNA sequencing reactions all use a primer to initiate DNA synthesis. This primer will determine the starting point of the sequence being read, and the direction of the sequencing reaction.
Most DNA sequencing reactions use dideoxy nucleotides (ddNTP) to stop DNA synthesis at specific nucleotides. For example, if the ddCTP to the right is incorportated into a growing strand of DNA, the lack of a free 3´ OH group would prevent the next nucleotide from being added, and the chain would terminate.
In automated sequencing we use a different fluorescent label attached to each of the four dideoxy nucleotides (ddA, ddC, ddG and ddT). Thus we can determine the terminal base in each fragment of DNA.
The two animations below illustrate how DNA sythesis and dideoxy termination are used to sequence DNA.
FUNCTIONAL MOLECULAR MARKERS
In the past few years, functional genes, ESTs and genome sequences have facilitated
development of molecular markers from the transcribed regions of the genome. Among
the important molecular markers that can be developed from ESTs are single-nucleotide
polymorphisms (SNPs) (Rafalski 2002), simple sequence repeats (SSRs) (Varshney et al.,
2005a). Putative functions for the markers derived from ESTs or genes can be deduced
using homology searches (BLASTX) with protein databases (e.g. NR-PEP and SWISSPROT).
Therefore, molecular markers generated from expressed sequence data are known as
‘functional markers’ (FMs) (Andersen and Lubberstedt 2003). FMs have been developed
236 PLANT GENETIC TRANSFORMATION AND MOLECULAR MARKERS
extensively for the plant species in which ESTs or gene sequence data are available (Gupta
and Rustgi 2004). By screening the unigene (see glossary) consensus sequences from over
50 plant species, Rudd et al. (2005) demonstrated the feasibility of predicting molecular
markers (e.g. SSRs and SNPs) that can be used to develop FMs for several species. FMs
have some advantages over random markers that are generated from anonymous region
of the genome, because FMs are linked to the desired trait allele. Such markers are derived
from the gene responsible for the trait of interest and target the functional polymorphism
in the gene they allow selection in different genetic backgrounds without revalidating the
marker–quantitative-trait-locus (QTL) allele relationship. An FM allows breeders to track
specific alleles within pedigrees and populations, and to minimize linkage drag flanking
the gene of interest. As markers become more abundant, breeders can develop strategies
that are compatible with financial resources and breeding goals. Markers are increasingly
being applied for selection of parental materials and for accelerated selection of loci
controlling traits that are difficult to select phenotypically. Such examples include
pyramiding of genes for disease resistance, quality trait and those that interact with the
environment. Linked deleterious alleles are potential problem as the number of selected
loci increase, particularly if the donor parent is a related wild species. Therefore, closely
linked markers are most desirable for reducing linkage drag, which require larger
population size and more backcross generations to be developed.
THE GENOMICS REVOLUTION
Genomic resources for major crop species include high-density genetic maps,
cytogenetic stocks, contig-based physical maps and large-insert libraries. These tools have
facilitated isolation of genes via map-based cloning, localizing QTLs, sequencing and
annotation of large genomic DNA fragments (Figure 1) in several plant species. Systematic
whole genome sequencing provides critical information on gene and genome organization
and function, which may revolutionize our understanding of crop production and ability
to manipulate the traits contributing to high crop productivity (Pereira 2000). Completion
of the Arabidopsis thaliana genome sequence in 2000, followed by the nearly completed
genome sequence for rice in 2002 (Goff et al., 2002; Yu et al., 2002), has caused much
excitement among researchers. Rapidly following these landmark efforts, characterization
of genomes of other crops including maize (Zea mays L.), wheat (Triticum aestivum L.) and
legumes such as soybean [Glycine max (L.) Merr.] and barril medic (Medicago truncatula
Gaertner; Ware et al., 2002; Lunde et al., 2003; Shoemaker et al., 2002; Young et al., 2003)
were initiated. The accumulating information allows researchers to explore new paradigms
to address fundamental and practical questions in a multidisciplinary manner. Although
new research fields such as metabolomics have emerged as post-genomic era technologies
(Phelps et al., 2002), challenges still lie ahead in answering how genomics will aid in
crop improvement from a practical standpoint (Osterlund and Paterson 2002). Genome
sequences comparisons of two rice cultivars (representing both the indica and japonica
subspecies (Goff et al. 2002; Yu et al. 2002; IRGSP 2005) have revealed many insertions
and deletions in the genomes (IRGSP 2005; Yu et al. 2005; Feng et al., 2002). Transcriptome
sampling strategy is a complementary approach to genome sequencing, which results in
a large collection of expressed sequence tags (ESTs) for important plant species. Comparative
sequence analysis can be used to facilitate isolation of genes in species lacking ESTs.
BIOINFORMATICS: KEY TO SUCCESS IN GENOMICS
Advances in genomics and bioinformatics show the true potential of biotechnology
for crop improvement. The rapid development of genome technologies, especially automatic
sequencing techniques, has produced a huge amount of data consisting essentially of
nucleotide and protein sequences. Bioinformatics facilitates analysis of genomic and postgenomic
data, and integration of data from the related fields of transcriptomics, proteomics
and metabolomics. The vast amount of sequence data coming out of the Genome Projects
can be managed and utilized with help of bioinformatics tools, as they play an important
role in candidate gene identification, gene finding, SNP detection, genotyping and genetic
analysis. Research and discovery in life sciences were once limited to single gene or
protein, but the developments in bioinformatics has facilitated a shift to high-throughput
screening (thousands of samples every day) and high-content detection systems (thousands
of data points per sample). Several bioinformatic tools and databases have been developed
for DNA sequence analysis, marker discovery and analyzing information. Enhanced
bioinformatic tools, genome databases and integration of information from different fields
enable the identification of genes and gene products, and can elucidate the functional
relationships between genotype and observed phenotype (Edwards and Batley 2004). To
store, characterize and mine such a large amount of data requires many databases and
240 PLANT GENETIC TRANSFORMATION AND MOLECULAR MARKERS
programs hosted in high performance computers. Until now, there has been several
databases, for example, GenBank (Benson 2004) Uniprot (Apweiler et al., 2004), PDB
(Berman 2000) KEGG (Kanehisa et al., 2000), PubMed Medline, etc., covering not only
nucleotide and protein sequences but also their annotations and related research
publications. The programs include those for sequence alignment, prediction of genes,
protein structures and regulatory elements, etc., some of which are organized into packages
such as EMBOSS (Rice et al., 2000) PHYLIP (Felsenstein 1989) and GCG Wisconsin. BLAST
(basic local alignment search tool) is a most popular tool that is widely used by biologists.
This is an algorithm for searching large databases of protein or DNA sequences. The NCBI
provides web based implementation that searches the massive sequences and annotated
data. A recent development, Simple Object Access Protocol (SOAP) (http://www.w3.org/
TR/soap) based interfaces developed for a variety of bioinformatics applications, allows
using programs running on a computer in one part of the world to use algorithm, data
and computer resources on servers in other parts of the world. The large availability of
SOAP based bioinformatics and web services along with the open source bioinformatics
collections lead to the next generation bioinformatics tools called integrated bioinformatics
platform. Another fundamental breakthrough in genomics came with DNA microarray
technology, which can detect tens of thousands of genes using a small functionalized
system (Brown and Botstein 1999; Debouck and Goodfellow 1999; Heller 2002). The
information flowing from genomic laboratories using DNA microarray technology constitutes
hundreds of thousands of gene-specific measurements every day. The overall impact of
this revolutionary technology depends upon an integrated information system to analyze
and store the data, as well as computational systems to design each of these gene-specific
detections. An important future prospect in this area is enhancement of visualization tools
that can help us more clearly interpret the complex multidimensional biological networks
of genes and their relationships to phenotypes.
Conventional plant breeding is primarily based on phenotypic selection of superior
individuals among segregating progenies. Although significant strides have been made
in crop improvement through phenotypic selections for agronomically important traits,
considerable difficulties are often encountered during this process, primarily due to
genotype-environment interactions. Besides, testing procedures may be many times difficult,
unreliable or expensive due to the nature of the target traits or the target environment.
Marker-assisted selection (MAS) refers to using DNA markers that are tightly-linked to
target loci as a substitute for phenotype based screening. By determining the allele of a
DNA marker, plants that possess particular genes or QTLs may be identified based on
their genotype rather than their phenotype. Marker-assisted selection may greatly increase
efficiency and effectiveness for breeding compared to conventional breeding. The
fundamental advantages of MAS compared to conventional phenotypic selection are: (i)
greater efficiency and (ii) accelerated line development in breeding programs. For example,
time and labour savings may arise from the substitution of difficult or time-consuming
field trials (that need to be conducted at particular times of year or at specific locations,
or are technically complicated) with DNA marker tests. Furthermore, selection based on
DNA markers may be more reliable due to the influence of environmental factors on field
GENOMICS AND MARKER-ASSISTED BREEDING FOR CROP IMPROVEMENT 241
trials. Another benefit from using MAS is that the total number of lines that need to be
tested may be reduced considerably. Since many lines can be discarded after MAS at an
early generation, this permits a more effective breeding design. Greater efficiency of target
trait selection may enable certain traits to be fast-tracked, thus specific genotypes can be
easily identified and selected. Moreover, background markers may also be used to
accelerate the recovery of recurrent parents during marker-assisted backcrossing.
In general, the success of MAS depends on following factors: (i) availability of genetic
map with an adequate number of uniformly-spaced polymorphic markers to accurately
locate desired QTLs or major gene(s), (ii) close linkage between the QTL or a major gene
of interest and adjacent markers, (iii) adequate recombination between the markers and
rest of the genome, and (iv) ability to analyze a larger number of plants in a time and
cost effective manner. Three kinds of relationships between the markers and respective
genes could be distinguished: (1) the molecular marker located within the gene of interest,
which is the most favourable situation for MAS and in this case, it could be ideally referred
to as gene-assisted selection. While this kind of relationship is the most preferred one,
it is also difficult to find such markers. For example, microsatellite or SSR markers have
been designed using the available DNA sequence information for the opaque2 allele that
confers high lysine and tryptophan content in the maize kernel. This has offered an
efficient means of tracking the opaque2 allele in breeding for nutritionally superior maize
genotypes, since the marker is located within the gene sequence itself and cosegregates
with the target gene. (2) the marker in linkage disequilibrium (LD) with the gene of interest
throughout the population. LD (see glossary) is the tendency of certain combination of
alleles to be inherited together. Population-wide LD can be found when markers and genes
of interest are physically close to each other. Selection using these markers can be called
as LD-MAS. (3) the marker in linkage equilibrium (LE) with the gene of interest throughout
the population, which is the most difficult and challenging situation for applying MAS.
Genomic regions to be selected using MAS are often chromosome segments carrying QTLs
in case of polygenic traits. It is preferable either to have two polymorphic DNA markers
flanking the target gene or a QTL, or a marker within a QTL to eliminate the possibility
of genotypes presenting a double recombination between the two flanking markers. In
context of MAS, DNA-based markers can be effectively utilized for two basic purposes:
(i) tracing favourable allele(s) (dominant or recessive) across generations and (ii) identifying
the most suitable individual(s) among the segregating progeny, based on allelic composition
across a part or the entire genome.
Based on the considerable developments in biotechnology, plant breeders developed
more efficient selection systems to replace traditional phenotypic-pedigree-based selection
systems. Conventional breeding is time consuming and much dependent on the
environmental conditions. Breeding a new variety takes between eight to twelve years and
even then the release of an improved variety cannot be guaranteed. Hence, breeders are
extremely interested in new technologies that could make this procedure more efficient.
Molecular marker technology offers such a possibility by adopting a wide range of novel
approaches to improve the selection strategies in breeding. Markers can aid selection for
target alleles that are not easily assayed in individual plants, minimize linkage drag
around the target gene, and reduce the number of generations required to recover a very
high percentage of the recurrent parent genetic background. Marker assisted selection
(MAS) is indirect selection process where a trait of interest is selected based on a marker
linked to it and not on the trait itself (Rosyara, 2006; Ribaut and Hoisington, 1998). For
example if MAS is being used to select a crop with a disease resistance, the level of disease
is not quantified but rather a marker allele which is linked with the disease is used to
determine the presence of the disease. The assumption is that linked allele gets associated
with the gene and/or quantitative trait locus (QTL) of interest. MAS can be useful for
traits that are difficult to measure, exhibit low heritability, and/or are expressed late in
What are they?
A marker is a gene or piece of DNA with easily identified phenotype such that
cells or individuals with different alleles are distinguishable. For example a gene with
a known function or a single nucleotide change in DNA
A readily detectable sequence of DNA or protein whose inheritance can be monitored
To become a useful molecular marker, it must possess certain characteristics:
Polymorphic: A polymorphism is a detectable and heritable variation at a locus.
A marker is polymorphic if the most abundant allele comprises less than X%
of all the alleles, usually 95%.
Reproducible: Should give similar results in different experiments irrespective
of the time and the place.
Preferably displays co-dominant inheritance (both forms are detectable in
The detection of marker must be fast and inexpensive.
Demonstrates measurable difference(s) in expression between trait types and/
or alleles of interest, early in the development of the organism.
Has no effect on the trait of interest that varies depending on the allele at the
MOLECULAR MARKERS FOR IMPROVEMENT OF QUALITY TRAITS IN CROPS 201
Low or null interaction among the markers allowing the use of many markers
at the same time in a segregating population
TYPES OF MARKERS
Seed color, for example Kernel color in maize, hylum color in soybean seeds
Pubescence (small hair like growth on stems, leaves)
Function based like Plant height associated with the salt tolerance in rice.
Limitations of the morphological markers
Morphological markers are associated with several general deficits that reduce their
The delay in morphological marker expression until late into the development
of the organism, for example flower color.
Dominance of the markers: homozygotes and heterozygotes are not
Deleterious effectsPleiotropyConfounding effect(s) of the genes unrelated to the gene or the trait of interest.However, that also affect the morphological marker (epistasis)Rare polymorphismFrequent confounding effect(s) of the environmental factors which affect themorphological characteristics of the organism.Most phenotypic markers are undesirable in the final product (Yellow color inmaize).Sometimes dependent on the environment for expression, for example heightof the plants.Non-DNA or Protein molecular markers such as isozyme markersMarkert and Moller (1959) were first to describe the differing forms of bands thatthey were able to visualize with specific enzyme stains. They were the first to introducethe term biochemical polymorphisms often referred as allozyme or isozyme markers. Bythe early 1980s, biochemical markers had been employed as a general tool for mappingQTL (Weller et al., 1988). Isozyme markers are still useful as these are simple, inexpensivemeans for detection of the gene introgression and recombination, for comparative mapping,and for determination of the genetic diversity and phylogenetic relationships among plantspecies (Hart and Langston, 1977; Hoffman, 1999; Horandl et al., 2000; Yu et al., 2001).Isozyme markers: Multiple forms of the same enzyme coded by the different genes.- Isozyme: one enzyme, more than one locus (gene duplication; gene families)- Allozyme: one enzyme; one locus; two or more alleles in a populationIsozymes are proteins with same enzymatic function but different structural,202 PLANT GENETIC TRANSFORMATION AND MOLECULAR MARKERSchemical, or immunological characteristics (coded by the different genes). To be usefulas marker, isoforms must be electrophoretically resolvable, and detectable by in-gel assaymethods (Fig. 1)Differences: amino acid composition/ sequenceDifferences visualized: gel electrophoresis, mass spectrometry, etc.Restricted due to limited number of enzyme systems available (about 30)