Glossary of terms used frequently in genome sequencing.

This glossary was compiled using the following sources; please refer to them for additional information:


The process of identifying regions of a genome sequence that are associated with specific functions and adding pertinent biological information to these sequences; for example, the specific gene for which the sequence codes


The process of taking fragments of DNA sequences and putting them together by matching overlapping sequences to create a representation of the original DNA that was sequenced.


Molecules that form DNA molecules, also called nucleotides, known by their abbreviations: A (adenine), T (thymine), C (cytosine) and G (guanosine).

Bases can form bonds with each other: A bonds only to T and C only with G, linking the two strands in the helical structure of DNA.

Base Pair

Unit of DNA comprising two bases on reciprocal strands commonly used to measure the size of genomes. The wheat genome has 16-17 billion base pairs, or pairs of DNA “letters” (A, T, C, and G).


The science of managing and analyzing biological data using advanced computing techniques.

BAC (Bacterial Artificial Chromosome)

An engineered DNA molecule used to clone DNA sequences in bacterial cells (for example, Escherichia coli). Segments of an organism's DNA, ranging from 100,000 to about 300,000 base pairs, can be inserted into BACs. The BACs, with their inserted DNA, are then taken up by bacterial cells. As the bacterial cells grow and divide, they amplify the BAC DNA, which can then be isolated and used in sequencing DNA.

BACs have proved very useful for producing physical maps and sequencing of large genomes, such as the human, rice, mouse and bread wheat genomes.

BAC Library

Because large genomes are difficult to sequence as a whole, the DNA is fragmented in small segments that are inserted into BACs and amplified. A BAC library is a collection of all the BACs produced in the process, representing the entire genome of an organism.


Basic Local Alignment Search Tool. A computer program used to perform sequence comparisons.


The smallest unit of life that can exist independently. All organisms are made up of one or more cells.


A piece of DNA that is formed into a compact structure by folding and association with specific proteins. Each species has a characteristic number of chromosomes. Bread wheat has 42 chromosomes: three sets of 7 pairs of chromosomes that are derived from ancestral diploid species.

Comparative Genomics

The science of comparing the genome sequences of different species to discover similarities and differences in biology. For instance, genome scientists and breeders might compare the genomes of cultivated wheat varieties with those of wild species to understand evolution or to increase the diversity of cultivated varieties through crossing with wild species.


Short for “contiguous sequence”. A piece of DNA sequence that has been assembled from overlapping sequence fragments.


A cell or organism that contains two copies of each chromosome.

DNA (deoxyribonucleic acid)

A molecule found in all living organisms that carries the genetic information.

The DNA molecule consists of two strands – or chains – of nucleotides joined together by bonds, forming a shape known as double-helix.

DNA Sequence

The order of genetic “letters,” or nucleotides, in a piece of DNA. For instance: ACGTACGTACGT

Draft sequence

A sequence that has been assembled into contigs, but a proportion of the sequence is missing (i.e., there are gaps) and the complete order and orientation of the fragments is unknown.

Functional genomics

The study of how genomes function, including the identification and regulation of genes, their resulting proteins, and the role played by the proteins in biochemical processes.


A gene is the basic physical and functional unit of heredity (i.e., the inherited properties of an organism that is passed from one generation to the next). Genes are made up of nucleic acid, are linear molecules consisting of a string of four nucleotides (in DNA, A, T, G, C); they provide instructions or a part of the instructions necessary to make molecules called proteins. In genomics, a gene is an ordered sequence of nucleotides located in a particular position on a chromosome.

Genetic Marker

An easily identifiable piece of genetic material, e.g., a gene or a portion of DNA, with a known location on a chromosome that can be tracked from one generation to the next.


All the genetic material in the chromosomes of a particular organism.

A genome contains the biological information for building, running, and maintaining an organism—and for passing life on to the next generation. Nearly every cell of an organism contains a complete copy of its genome.

Genome map

A map of the relative positions of landmarks within a genome, their chromosomal position, and the distances between them. Landmarks might include short DNA sequences, regulatory sites that turn gene on or off, and genes.

Genetic map

A map of the relative positions of genes, genetic markers, and other features within a chromosome or genome determined on the basis of recombination frequency between markers.


The study of the structure and organization of genomes, their individual elements (e.g. genes), how they function, and how they are regulated.


In genomics, a region of the genome that is not represented in a map or by sequence.


Containing six sets of chromosomes in each cell.

The bread wheat genome is hexaploid, containing three sets of 7 pairs of chromosomes.

High-throughput sequencing

A rapid method of determining the order of the DNA bases of a genome. With this method, some small genomes can be sequenced in just a few days.

Kilobase (kb)

Unit of length for DNA fragments that equal 1000 nucleotides.

Minimum tiling path (MTP)

MTPs are ways of sequencing a chromosome or genome by dividing the genome into BACs then sequencing and assembling them. The MTP refers to an ordered list or “map” of the minimum set of overlapping BACs necessary to provide complete coverage of the whole chromosome or genome.

Non-coding DNA

DNA in the genome that is not directly involved in making proteins or other molecules.

About 98 % of the wheat genome consists of non-coding DNA. The functions of most non-coding fragments are not yet known; recent evidence suggests that they are involved in controlling the activity of genes.


The four chemical subunits of the DNA molecule, also called bases, known by their abbreviations A, T, C, and G.


The set of observable characteristics of an organism.

These characteristics can be controlled by genetics, by the environment, or a combination of both.

Positional cloning

A technique used to identify and isolate genes, usually those that are associated with a specific trait, based on their physical location on a chromosome. Traits are usually positioned first on the basis of proximity to genetic markers associated with chromosomal regions. Then, if a physical map covering the region is available, they are positioned relative to BACs across the region and subsequently to genes annotated in the BAC sequences.

Physical map

A map of the locations of identifiable landmarks on a chromosome or genome. Physical maps are an alignment of sequences (BACs) with distance between markers measured in base pairs. A physical map often refers to a map of overlapping BAC clones from a library that shows the relative positions of the clones along chromosomes. High resolution physical maps serve as a scaffold for genome sequence assembly.


A representation of the entire sequence of a chromosome that is assembled from smaller sequence contigs. In most cases, the pseudomolecule is ordered using physical and genetic map information.

Quantitative trait locus (QTL)

Stretch of DNA containing or linked to genes that underlie a trait.


The exchange of DNA sequence between sister chromatids during meiosis.

Reference sequence

The formally recognized, verified genome sequence of an organism that is used as a representative example of the genome for a particular species. A reference sequence is useful for assembling and comparing individual genomes of the same species (e.g., comparing elite varieties of wheat with the reference sequence for the purpose of understanding the inherited basis of key traits).


The sequential order of nucleotides (genetic “letters”) in a piece of DNA. A short DNA sequence might be: ACGTACGTACGT


The determination of the sequential order of nucleotides in a piece of DNA or an entire genome.

Single Nucleotide Polymorphism (SNP)

A variation in a single base (A, T, C or G) found when comparing the same DNA sequence from two different individuals in the same species.

Shotgun Sequencing (Also called Whole-Genome Shotgun Sequencing)

A laboratory technique for determining the DNA sequence of an organism's genome. The method involves breaking the genome into a collection of small DNA fragments (typically 600bp to 50kb in size, depending on sequencing technology) that are sequenced individually. A computer program looks for overlaps in the DNA sequences and uses them to place the individual fragments in their correct order to reconstitute the genome.


A physical or agronomical characteristic – such as high yield, resistance to pathogens, resistance to a stress.

Whole Genome Assembly

A whole genome assembly is the process of taking fragments of DNA sequences from an entire (whole) genome and, using high throughput technology, joining them by matching overlapping sequences to create a representation of the original DNA that was sequenced. This contrasts with sequence assemblies of individual chromosomes/chromosome arms.

Modification date: 16 August 2023 | Publication date: 06 February 2021 | By: ic