Sequencing technologies are unable to sequence the entire human genome at once. Thus, the genome must be broken into smaller chunks of DNA, sequenced and then put back together in the correct order using bioinformatics approaches. There are several methods of DNA sequencing, including clone-by-clone and whole-genome shotgun methods. For more information on whole-genome sequencing as it relates to field of immuno-oncology, see our sister site, Learn ImmunoOncology.
Clone-by-clone
This method requires the genome to have smaller sections copied and inserted into bacteria. The bacteria then can be grown to produce identical copies, or “clones,” containing approximately 150,000 base pairs of the genome that is desired to be sequenced. Then, the inserted DNA in each clone is further broken down into smaller, overlapping 500 base pair chunks. These smaller inserts are sequenced. After sequencing is performed, the overlapping portions are used to reassemble the clone. This approach was used to sequence the first human genome using Sanger sequencing. This approach is time-consuming and costly, but it is reliable.
Whole-genome shotgun
As the name implies, “shotgun” sequencing is a method that breaks DNA into small random pieces for sequencing and reassembly. The pieces of DNA are also cloned into bacteria for growth, isolation and subsequent sequencing. Because the pieces are random, there are overlapping sequences that aid in reassembly into the original DNA order. This approach was originally used in Sanger sequencing but is now also used in next-generation sequencing methods providing rapid genome sequencing with lower costs. It is only good for shorter “reads” (ie, sequencing on shorter DNA fragments to be put back together again). Because it is reassembled based on overlapping regions and has shorter read lengths, it is best utilized when a reference genome is available, and it requires sophisticated computational approaches to reassemble the sequence. It also can be challenging for genomes with many repetitive regions.
Assembly of sequencing reads
Because genomes are sequenced in varying lengths of DNA fragments, the resulting sequences must be put back together. This is referred to as “assembly,” or “reassembly.” Two common approaches are de novo assembly and assembly by reference mapping.
De novo assembly is performed by identifying overlapping regions in the DNA sequences, aligning the sequences and putting them back together to form the genome. This is done without any sequence with which to compare. Mapping to a reference genome uses another genome to align new sequencing data to as a comparator.
Although de novo assembly can be challenging, this approach is the only one available for sequencing new organisms. Additionally, de novo assembly introduces results with less bias than mapping to a reference genome. Mapping to a reference genome is easier and requires less contiguous reads, but new or unexpected sequences can be lost. The sequence results obtained by this method is only as good as the reference genome chosen; however, it can provide better identification of single nucleotide polymorphisms (SNPs). Multiple institutions and genomic sequencing companies have invested considerable time and effort into creating improved reference genomes. Single nucleotide polymorphisms are known to vary by race and ethnicity, thus, multiple reference genomes have been created for various races/ethnicities.
Examples of next-generation sequencing platforms
Several companies focus on development and marketing of next-generation sequencing machines (often referred to as “platforms”) for use in whole-genome (and other) sequencing. Illumina is considered by many as the leader because of the number of users that utilize its systems. Illumina has multiple platforms depending on the need. The Illumina HiSeq is one of the more common sequencers found in laboratories, including major research institutions, companies providing next-generation sequencing services for clinics and labs, and pathology laboratories. It has a high throughput, capable of sequencing many genomes rapidly with reasonable costs. This instrument also can be used to look at copy number variation, as well as mutations and other alterations, and RNA expression levels to do transcriptomics. Because of the popularity in the clinic of targeted sequencing panels, which are much smaller with clinics requiring faster turnaround times for treatment of patients, Illumina created the MiSeq, which can provide same-day sequencing results for very small panels. Illumina also produced multiple variations to provide sequencers for each disease area optimizing output, turnaround time and costs for specific use cases.
Thermo Fisher Scientific’s Ion Torrent or Ion Proton uses a completely different technology based on detection of pH differences and was once expected to provide better utility for clinical applications because it was easier to use, cost less and provided faster turnaround time. However, Illumina countered with new machines to fit these needs. Consequently, both are found in research and clinical laboratories.
Other technologies developed recently use different novel approaches. A few examples are provided below.
Oxford Nanopore Technologies introduced the MinION, which enables anyone to sequence on a desktop computer using a USB device. The DNA is passed through a protein nanopore membrane for sequencing and detection by creation of an ionic current that varies based on the nucleotide.
Pacific Biosciences introduced its single molecule, real-time technology with the longest reads to date, with average read lengths of more than 10,000 base pairs compared with more than 150 base pairs. Single molecule, real-time technology uses a chip with single DNA molecules attached. Zero-mode waveguide technology enables isolation of a single nucleotide for the DNA polymerase to add fluorescent labels for detection of each base. The error rate of this instrument is still higher than some of the prior technologies, but a lot of interest has been generated, and there is hope that speed and costs can be further optimized with the new approach.
Coverage breadth and depth
Coverage refers to the number of reads that show a specific nucleotide in the reconstructed DNA sequence. A read is a string of A, T, C, G bases that correspond to the reference DNA. There are millions of reads in a sequencing run. Increased coverage depth results in increased confidence in variant identification.
For the human genome, a 10- to 30-times coverage depth is acceptable for detecting mutations, SNPs and rearrangements. A next-generation sequencing approach that provides a coverage depth of 30 times is considered to have high coverage. However, as coverage depth increases, coverage breadth decreases (Figure 1).
Figure 1. Relationship between coverage breadth vs. coverage depth.
Source: Elaine Mardis, PhD
Thank you for participating in this module. Click below to download the certificate.