The development of molecular techniques for genetic analysis has led to a great augmentation in our knowledge of crop genetics and our understanding of the structure and behavior of various crop genomes. These molecular techniques, in particular the applications of molecular markers, have been used to scrutinize DNA sequence variation(s) in and among the crop species and create new sources of genetic variation by introducing new and favorable traits from landraces and related crop species.

Markers can aid selection for target alleles that are not easily assayed in individual plants, minimize linkage drag around the target gene, and reduce the number of generations required to recover a very high percentage of the recurrent parent genetic background. Improvements in marker detection systems and in the techniques used to identify markers linked to useful traits, has enabled great advances to be made in recent years.

Though restriction fragments length polymorphism (RFLP) markers have been the basis for most of the work in crop plants, valuable markers have been generated from random amplification polymorphic DNA (RAPD) and amplified fragments length polymorphism (AFLP). Simple sequence repeats (SSR) or microsatellite markers have been developed more recently for major crop plants and this marker system is predicted to lead to even more rapid advances in both marker development and implementation in breeding programs.

Identification of the markers linked to useful traits has been based on complete linkage maps and bulked segregant analysis. However, alternative methods, such as the construction of partial maps and combination of pedigree and marker information, have also proved useful in identifying marker/trait associations. A revision of current breeding methods by utilizing molecular markers in breeding programs is, therefore, crucial in the present scenario.


A genetic marker is a gene or DNA sequence with a known location on a chromosome and associated with a particular gene or trait. It can be described as a variation, which may arise due to mutation or alteration in the genomic loci, that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like minisatellites.

For many years, gene mapping was limited in most organisms by traditional genetic markers which include genes that encode easily observable characteristics such as blood types or seed shapes. The insufficient amount of these types of characteristics in several organisms limited the mapping efforts that could be done.[1]

Some commonly used types of genetic markers are

• RFLP (or Restriction fragment length polymorphism)

• AFLP (or Amplified fragment length polymorphism

• RAPD (or Random amplification of polymorphic DNA)

• VNTR (or Variable number tandem repeat)

• Microsatellite polymorphism

• SNP (or Single nucleotide polymorphism)

• STR (or Short tandem repeat)

• SFP (or Single feature polymorphism)

• DArT (or Diversity Arrays Technology)

They can be further categorized as dominant or co-dominant. Dominant markers allow for analyzing many loci at one time, e.g. RAPD. A primer amplifying a dominant marker could amplify at many loci in one sample of DNA with one PCR reaction. Co-dominant markers analyze one locus at a time. A primer amplifying a co-dominant marker would yield one targeted product.

1. Restriction fragment length polymorphism

In molecular biology, the term restriction fragment length polymorphism, or RFLP, (commonly pronounced “rif-lip”) refers to a difference between two or more samples of homologous DNA molecules arising from differing locations of restriction sites, and to a related laboratory technique by which these segments can be distinguished. In RFLP analysis the DNA sample is broken into pieces (digested) by restriction enzymes and the resulting restriction fragments are separated according to their lengths by gel electrophoresis. Although now largely obsolete, RFLP analysis was the first DNA profiling technique cheap enough to see widespread application. In addition to genetic fingerprinting, RFLP was an important tool in genome mapping, localization of genes for genetic disorders, determination of risk for disease, and paternity testing.

Amplified fragment length polymorphism

Amplified Fragment Length Polymorphism PCR (or AFLP-PCR or just AFLP) is a PCR-based tool used in genetics research, DNA fingerprinting, and in the practice of genetic engineering. Developed in the early 1990’s by Keygene[1], AFLP uses restriction enzymes to cut genomic DNA, followed by ligation of adaptors to the sticky ends of the restriction fragments. A subset of the restriction fragments are then amplified using primers complementary to the adaptor and part of the restriction site fragments (as described in detail below). The amplified fragments are visualized on denaturing polyacrylamide gels either through autoradiography or fluorescence methodologies.

AFLP-PCR is a highly sensitive method for detecting polymorphisms in DNA. The technique was originally described by Vos and Zabeau in 1993[2][3]. In detail, the procedure of this technique is divided into three steps: [1]

1. Digestion of total cellular DNA with one or more restriction enzymes and ligation of restriction half-site specific adaptors to all restriction fragments.

2. Selective amplification of some of these fragments with two PCR primers that have corresponding adaptor and restriction site specific sequences.

3. Electrophoretic separation of amplicons on a gel matrix, followed by visualisation of the band pattern.

A variation on AFLP is cDNA-AFLP, which is used to quantify differences in gene expression levels.

Applications of AFLP:

The AFLP technology has the capability to detect various polymorphisms in different genomic regions simultaneously. It is also highly sensitive and reproducible. As a result, AFLP has become widely used for the identification of genetic variation in strains or closely related species of plants, fungi, animals, and bacteria. The AFLP technology has been used in criminal and paternity tests, in population genetics to determine slight differences within populations, and in linkage studies to generate maps for quantitative trait locus (QTL) analysis.

There are many advantages to AFLP when compared to other marker technologies including randomly amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), and microsatellites. AFLP not only has higher reproducibility, resolution, and sensitivity at the whole genome level compared to other techniques[4], but it also has the capability to amplify between 50 and 100 fragments at one time. In addition, no prior sequence information is needed for amplification (Meudt&Clarke 2007)[5]. As a result, AFLP has become extremely beneficial in the study of taxa including bacteria, fungi, and plants, where much is still unknown about the genomic makeup of various organisms.


RAPD (pronounced "rapid") stands for Random Amplification of Polymorphic DNA. It is a type of PCR reaction, but the segments of DNA that are amplified are random. The scientist performing RAPD creates several arbitrary, short primers (8-12 nucleotides), then proceeds with the PCR using a large template of genomic DNA, hoping that fragments will amplify. By resolving the resulting patterns, a semi-unique profile can be gleaned from a RAPD reaction.

No knowledge of the DNA sequence for the targeted gene is required, as the primers will bind somewhere in the sequence, but it is not certain exactly where. This makes the method popular for comparing the DNA of biological systems that have not had the attention of the scientific community, or in a system in which relatively few DNA sequences are compared (it is not suitable for forming a DNA databank). Because it relies on a large, intact DNA template sequence, it has some limitations in the use of degraded DNA samples. Its resolving power is much lower than targeted, species specific DNA comparison methods, such as short tandem repeats. In recent years, RAPD has been used to characterize, and trace, the phylogeny of diverse plant and animal species.


1. KeyGene Quantar Suite Versatile marker scoring software

2. Zabeau, M and P. Vos. 1993. Selective restriction fragment amplification: a general method for DNA fingerprinting. European Patent Office, publication 0 534 858 A1, bulletin 93/13.

3. Vos P, Hogers R, Bleeker M, et al. (November 1995). "AFLP: a new technique for DNA fingerprinting". Nucleic Acids Res. 23 (21): 4407–14. PMID 7501463. PMC 307397.

First you need to know a few key terms:

As you go through the subsequent discussion, you may need to jump back here to refresh your memory on various definitions.


We assume you've read through the description of DNA structure, an earlier link in this thread ... right?


A 'plasmid' is a small, circular piece of DNA that is often found in bacteria. This innocuous molecule might help the bacteria survive in the presence of an antibiotic, for example, due to the genes it carries. To scientists, however, plasmids are important because (i) we can isolate them in large quantities, (ii) we can cut and splice them, adding whatever DNA we choose, (iii) we can put them back into bacteria, where they'll replicate along with the bacteria's own DNA, and (iv) we can isolate them again - getting billions of copies of whatever DNA we inserted into the plasmid! Plasmid are limited to sizes of 2.5-20 kilobases (kb), in general.


The term 'BAC" is an acronym for 'Bacterial Artificial Chromosome', and in principle, it is used like a plasmid. We construct BACs that carry DNA from humans or mice or wherever, and we insert the BAC into a host bacterium. As with the plasmid, when we grow that bacterium, we replicate the BAC as well. Huge pieces of DNA can be easily replicated using BACs - usually on the order of 100-400 kilobases (kb). Using BACs, scientists have cloned (replicated) major chunks of human DNA. This, as you will see later, is critical to the Human Genome Project.


The 'vector' is generally the basic type of DNA molecule used to replicate your DNA, like a plasmid or a BAC.


The 'insert' is a piece of DNA we've purposely put into another (a 'vector') so that we can replicate it. Usually the 'insert' is the interesting part, consequently. In the case of the Human Genome Project or other sequencing projects, the insert is the part we want to sequence - the part we don't know. Usually we know the complete DNA sequence of the vector.

Shotgun Sequencing

Shotgun sequencing is a method for determining the sequence fo a very large piece of DNA. The basic DNA sequencing reaction can only get the sequence of a few hundred nucleotides. For larger ones (like BAC DNA), we usually fragment the DNA and insert the resultant pieces into a convenient vector (a plasmid, usually) to replicate them. After we sequence the fragments, we try to deduce from them the sequence of the original BAC DNA.

Shotgun sequencing: assembly of random sequence fragments

To sequence a BAC, we take millions of copies of it and chop them all up randomly. We then insert those into plasmids and for each one we get, we grow lots of it in bacteria and sequence the insert. If we do this to enough fragments, eventually we'll be able to reconstruct the sequence of the original BAC based on the overlapping fragments we've sequenced!

What is a primer?

A primer is a short synthetic oligonucleotide which is used in many molecular techniques from PCR to DNA sequencing. These primers are designed to have a sequence which is the reverse complement of a region of template or target DNA to which we wish the primer to anneal.

DNA sequencing:

DNA sequencing reactions all use a primer to initiate DNA synthesis. This primer will determine the starting point of the sequence being read, and the direction of the sequencing reaction.

Most DNA sequencing reactions use dideoxy nucleotides (ddNTP) to stop DNA synthesis at specific nucleotides. For example, if the ddCTP to the right is incorportated into a growing strand of DNA, the lack of a free 3´ OH group would prevent the next nucleotide from being added, and the chain would terminate.

In automated sequencing we use a different fluorescent label attached to each of the four dideoxy nucleotides (ddA, ddC, ddG and ddT). Thus we can determine the terminal base in each fragment of DNA.

The two animations below illustrate how DNA sythesis and dideoxy termination are used to sequence DNA.

Deoxycytosine (dCTP)

Dideoxycytosine (ddCTP)


In the past few years, functional genes, ESTs and genome sequences have facilitated

development of molecular markers from the transcribed regions of the genome. Among

the important molecular markers that can be developed from ESTs are single-nucleotide

polymorphisms (SNPs) (Rafalski 2002), simple sequence repeats (SSRs) (Varshney et al.,

2005a). Putative functions for the markers derived from ESTs or genes can be deduced

using homology searches (BLASTX) with protein databases (e.g. NR-PEP and SWISSPROT).

Therefore, molecular markers generated from expressed sequence data are known as

‘functional markers’ (FMs) (Andersen and Lubberstedt 2003). FMs have been developed


extensively for the plant species in which ESTs or gene sequence data are available (Gupta

and Rustgi 2004). By screening the unigene (see glossary) consensus sequences from over

50 plant species, Rudd et al. (2005) demonstrated the feasibility of predicting molecular

markers (e.g. SSRs and SNPs) that can be used to develop FMs for several species. FMs

have some advantages over random markers that are generated from anonymous region

of the genome, because FMs are linked to the desired trait allele. Such markers are derived

from the gene responsible for the trait of interest and target the functional polymorphism

in the gene they allow selection in different genetic backgrounds without revalidating the

marker–quantitative-trait-locus (QTL) allele relationship. An FM allows breeders to track

specific alleles within pedigrees and populations, and to minimize linkage drag flanking

the gene of interest. As markers become more abundant, breeders can develop strategies

that are compatible with financial resources and breeding goals. Markers are increasingly

being applied for selection of parental materials and for accelerated selection of loci

controlling traits that are difficult to select phenotypically. Such examples include

pyramiding of genes for disease resistance, quality trait and those that interact with the

environment. Linked deleterious alleles are potential problem as the number of selected

loci increase, particularly if the donor parent is a related wild species. Therefore, closely

linked markers are most desirable for reducing linkage drag, which require larger

population size and more backcross generations to be developed.


Genomic resources for major crop species include high-density genetic maps,

cytogenetic stocks, contig-based physical maps and large-insert libraries. These tools have

facilitated isolation of genes via map-based cloning, localizing QTLs, sequencing and

annotation of large genomic DNA fragments (Figure 1) in several plant species. Systematic

whole genome sequencing provides critical information on gene and genome organization

and function, which may revolutionize our understanding of crop production and ability

to manipulate the traits contributing to high crop productivity (Pereira 2000). Completion

of the Arabidopsis thaliana genome sequence in 2000, followed by the nearly completed

genome sequence for rice in 2002 (Goff et al., 2002; Yu et al., 2002), has caused much

excitement among researchers. Rapidly following these landmark efforts, characterization

of genomes of other crops including maize (Zea mays L.), wheat (Triticum aestivum L.) and

legumes such as soybean [Glycine max (L.) Merr.] and barril medic (Medicago truncatula

Gaertner; Ware et al., 2002; Lunde et al., 2003; Shoemaker et al., 2002; Young et al., 2003)

were initiated. The accumulating information allows researchers to explore new paradigms

to address fundamental and practical questions in a multidisciplinary manner. Although

new research fields such as metabolomics have emerged as post-genomic era technologies

(Phelps et al., 2002), challenges still lie ahead in answering how genomics will aid in

crop improvement from a practical standpoint (Osterlund and Paterson 2002). Genome

sequences comparisons of two rice cultivars (representing both the indica and japonica

subspecies (Goff et al. 2002; Yu et al. 2002; IRGSP 2005) have revealed many insertions

and deletions in the genomes (IRGSP 2005; Yu et al. 2005; Feng et al., 2002). Transcriptome

sampling strategy is a complementary approach to genome sequencing, which results in

a large collection of expressed sequence tags (ESTs) for important plant species. Comparative

sequence analysis can be used to facilitate isolation of genes in species lacking ESTs.


Advances in genomics and bioinformatics show the true potential of biotechnology

for crop improvement. The rapid development of genome technologies, especially automatic

sequencing techniques, has produced a huge amount of data consisting essentially of

nucleotide and protein sequences. Bioinformatics facilitates analysis of genomic and postgenomic

data, and integration of data from the related fields of transcriptomics, proteomics

and metabolomics. The vast amount of sequence data coming out of the Genome Projects

can be managed and utilized with help of bioinformatics tools, as they play an important

role in candidate gene identification, gene finding, SNP detection, genotyping and genetic

analysis. Research and discovery in life sciences were once limited to single gene or

protein, but the developments in bioinformatics has facilitated a shift to high-throughput

screening (thousands of samples every day) and high-content detection systems (thousands

of data points per sample). Several bioinformatic tools and databases have been developed

for DNA sequence analysis, marker discovery and analyzing information. Enhanced

bioinformatic tools, genome databases and integration of information from different fields

enable the identification of genes and gene products, and can elucidate the functional

relationships between genotype and observed phenotype (Edwards and Batley 2004). To

store, characterize and mine such a large amount of data requires many databases and


programs hosted in high performance computers. Until now, there has been several

databases, for example, GenBank (Benson 2004) Uniprot (Apweiler et al., 2004), PDB

(Berman 2000) KEGG (Kanehisa et al., 2000), PubMed Medline, etc., covering not only

nucleotide and protein sequences but also their annotations and related research

publications. The programs include those for sequence alignment, prediction of genes,

protein structures and regulatory elements, etc., some of which are organized into packages

such as EMBOSS (Rice et al., 2000) PHYLIP (Felsenstein 1989) and GCG Wisconsin. BLAST

(basic local alignment search tool) is a most popular tool that is widely used by biologists.

This is an algorithm for searching large databases of protein or DNA sequences. The NCBI

provides web based implementation that searches the massive sequences and annotated

data. A recent development, Simple Object Access Protocol (SOAP) (

TR/soap) based interfaces developed for a variety of bioinformatics applications, allows

using programs running on a computer in one part of the world to use algorithm, data

and computer resources on servers in other parts of the world. The large availability of

SOAP based bioinformatics and web services along with the open source bioinformatics

collections lead to the next generation bioinformatics tools called integrated bioinformatics

platform. Another fundamental breakthrough in genomics came with DNA microarray

technology, which can detect tens of thousands of genes using a small functionalized

system (Brown and Botstein 1999; Debouck and Goodfellow 1999; Heller 2002). The

information flowing from genomic laboratories using DNA microarray technology constitutes

hundreds of thousands of gene-specific measurements every day. The overall impact of

this revolutionary technology depends upon an integrated information system to analyze

and store the data, as well as computational systems to design each of these gene-specific

detections. An important future prospect in this area is enhancement of visualization tools

that can help us more clearly interpret the complex multidimensional biological networks

of genes and their relationships to phenotypes.


Conventional plant breeding is primarily based on phenotypic selection of superior

individuals among segregating progenies. Although significant strides have been made

in crop improvement through phenotypic selections for agronomically important traits,

considerable difficulties are often encountered during this process, primarily due to

genotype-environment interactions. Besides, testing procedures may be many times difficult,

unreliable or expensive due to the nature of the target traits or the target environment.

Marker-assisted selection (MAS) refers to using DNA markers that are tightly-linked to

target loci as a substitute for phenotype based screening. By determining the allele of a

DNA marker, plants that possess particular genes or QTLs may be identified based on

their genotype rather than their phenotype. Marker-assisted selection may greatly increase

efficiency and effectiveness for breeding compared to conventional breeding. The

fundamental advantages of MAS compared to conventional phenotypic selection are: (i)

greater efficiency and (ii) accelerated line development in breeding programs. For example,

time and labour savings may arise from the substitution of difficult or time-consuming

field trials (that need to be conducted at particular times of year or at specific locations,

or are technically complicated) with DNA marker tests. Furthermore, selection based on

DNA markers may be more reliable due to the influence of environmental factors on field


trials. Another benefit from using MAS is that the total number of lines that need to be

tested may be reduced considerably. Since many lines can be discarded after MAS at an

early generation, this permits a more effective breeding design. Greater efficiency of target

trait selection may enable certain traits to be fast-tracked, thus specific genotypes can be

easily identified and selected. Moreover, background markers may also be used to

accelerate the recovery of recurrent parents during marker-assisted backcrossing.

In general, the success of MAS depends on following factors: (i) availability of genetic

map with an adequate number of uniformly-spaced polymorphic markers to accurately

locate desired QTLs or major gene(s), (ii) close linkage between the QTL or a major gene

of interest and adjacent markers, (iii) adequate recombination between the markers and

rest of the genome, and (iv) ability to analyze a larger number of plants in a time and

cost effective manner. Three kinds of relationships between the markers and respective

genes could be distinguished: (1) the molecular marker located within the gene of interest,

which is the most favourable situation for MAS and in this case, it could be ideally referred

to as gene-assisted selection. While this kind of relationship is the most preferred one,

it is also difficult to find such markers. For example, microsatellite or SSR markers have

been designed using the available DNA sequence information for the opaque2 allele that

confers high lysine and tryptophan content in the maize kernel. This has offered an

efficient means of tracking the opaque2 allele in breeding for nutritionally superior maize

genotypes, since the marker is located within the gene sequence itself and cosegregates

with the target gene. (2) the marker in linkage disequilibrium (LD) with the gene of interest

throughout the population. LD (see glossary) is the tendency of certain combination of

alleles to be inherited together. Population-wide LD can be found when markers and genes

of interest are physically close to each other. Selection using these markers can be called

as LD-MAS. (3) the marker in linkage equilibrium (LE) with the gene of interest throughout

the population, which is the most difficult and challenging situation for applying MAS.

Genomic regions to be selected using MAS are often chromosome segments carrying QTLs

in case of polygenic traits. It is preferable either to have two polymorphic DNA markers

flanking the target gene or a QTL, or a marker within a QTL to eliminate the possibility

of genotypes presenting a double recombination between the two flanking markers. In

context of MAS, DNA-based markers can be effectively utilized for two basic purposes:

(i) tracing favourable allele(s) (dominant or recessive) across generations and (ii) identifying

the most suitable individual(s) among the segregating progeny, based on allelic composition

across a part or the entire genome.

Based on the considerable developments in biotechnology, plant breeders developed

more efficient selection systems to replace traditional phenotypic-pedigree-based selection

systems. Conventional breeding is time consuming and much dependent on the

environmental conditions. Breeding a new variety takes between eight to twelve years and

even then the release of an improved variety cannot be guaranteed. Hence, breeders are

extremely interested in new technologies that could make this procedure more efficient.

Molecular marker technology offers such a possibility by adopting a wide range of novel

approaches to improve the selection strategies in breeding. Markers can aid selection for

target alleles that are not easily assayed in individual plants, minimize linkage drag

around the target gene, and reduce the number of generations required to recover a very

high percentage of the recurrent parent genetic background. Marker assisted selection

(MAS) is indirect selection process where a trait of interest is selected based on a marker

linked to it and not on the trait itself (Rosyara, 2006; Ribaut and Hoisington, 1998). For

example if MAS is being used to select a crop with a disease resistance, the level of disease

is not quantified but rather a marker allele which is linked with the disease is used to

determine the presence of the disease. The assumption is that linked allele gets associated

with the gene and/or quantitative trait locus (QTL) of interest. MAS can be useful for

traits that are difficult to measure, exhibit low heritability, and/or are expressed late in


Molecular markers

What are they?

A marker is a gene or piece of DNA with easily identified phenotype such that

cells or individuals with different alleles are distinguishable. For example a gene with

a known function or a single nucleotide change in DNA


A readily detectable sequence of DNA or protein whose inheritance can be monitored

To become a useful molecular marker, it must possess certain characteristics:

Polymorphic: A polymorphism is a detectable and heritable variation at a locus.

A marker is polymorphic if the most abundant allele comprises less than X%

of all the alleles, usually 95%.

Reproducible: Should give similar results in different experiments irrespective

of the time and the place.

Preferably displays co-dominant inheritance (both forms are detectable in


The detection of marker must be fast and inexpensive.

Demonstrates measurable difference(s) in expression between trait types and/

or alleles of interest, early in the development of the organism.

Has no effect on the trait of interest that varies depending on the allele at the

marker loci.


Low or null interaction among the markers allowing the use of many markers

at the same time in a segregating population


Morphological markers

Seed color, for example Kernel color in maize, hylum color in soybean seeds

Pubescence (small hair like growth on stems, leaves)

Function based like Plant height associated with the salt tolerance in rice.

Limitations of the morphological markers

Morphological markers are associated with several general deficits that reduce their

usefulness including:

The delay in morphological marker expression until late into the development

of the organism, for example flower color.

Dominance of the markers: homozygotes and heterozygotes are not


Deleterious effectsPleiotropyConfounding effect(s) of the genes unrelated to the gene or the trait of interest.However, that also affect the morphological marker (epistasis)Rare polymorphismFrequent confounding effect(s) of the environmental factors which affect themorphological characteristics of the organism.Most phenotypic markers are undesirable in the final product (Yellow color inmaize).Sometimes dependent on the environment for expression, for example heightof the plants.Non-DNA or Protein molecular markers such as isozyme markersMarkert and Moller (1959) were first to describe the differing forms of bands thatthey were able to visualize with specific enzyme stains. They were the first to introducethe term biochemical polymorphisms often referred as allozyme or isozyme markers. Bythe early 1980s, biochemical markers had been employed as a general tool for mappingQTL (Weller et al., 1988). Isozyme markers are still useful as these are simple, inexpensivemeans for detection of the gene introgression and recombination, for comparative mapping,and for determination of the genetic diversity and phylogenetic relationships among plantspecies (Hart and Langston, 1977; Hoffman, 1999; Horandl et al., 2000; Yu et al., 2001).Isozyme markers: Multiple forms of the same enzyme coded by the different genes.- Isozyme: one enzyme, more than one locus (gene duplication; gene families)- Allozyme: one enzyme; one locus; two or more alleles in a populationIsozymes are proteins with same enzymatic function but different structural,202 PLANT GENETIC TRANSFORMATION AND MOLECULAR MARKERSchemical, or immunological characteristics (coded by the different genes). To be usefulas marker, isoforms must be electrophoretically resolvable, and detectable by in-gel assaymethods (Fig. 1)Differences: amino acid composition/ sequenceDifferences visualized: gel electrophoresis, mass spectrometry, etc.Restricted due to limited number of enzyme systems available (about 30)