Welcome to the Sequencing 101 blog series — where we will provide introductions to sequencing technology, genomics, and much more.
If you’re not immersed in the field of DNA sequencing, it can be challenging to keep up with the rapid evolution among all the platforms and technologies on the market. Let’s start with a quick overview of how these different technologies came about — and how each is used today.
First generation sequencing — starting the era of genomics
DNA sequencing as we know it originated in the late 1970s, when Frederick Sanger at the MRC Centre in Cambridge developed a gel-based method that combined a DNA polymerase with a mixture of standard and chain-terminating nucleotides, known as ddNTPs. Mixing dNTPS with ddNTPs causes random early termination of sequencing reactions during PCR. Four reactions are run, each with the chain-terminating version of only one base (A, T, G or C). When visualized with gel electrophoresis, one reaction per lane, the fragments are sorted by length, allowing the DNA sequence to be read off base by base. This technique was revolutionary at the time, enabling sequencing of 500–1,000 bp fragments. However, since the original method used radioactive ddNTPs and X-rays, it was less than ideal for widespread use.
By the 1980s, Sanger’s original method had been automated by scientists at Caltech and commercialized by Applied Biosystems. Radioactive ddNTPs were replaced with dye-labelled nucleotides and large slab gels were replaced with acrylic-finer capillaries. Scientists could now simply feed prepared DNA into a machine and view the results of fluorescence-based reactions on an electropherogram. This technology, which was continuously improved over the years, served as the bedrock of the Human Genome Project. Today, automated Sanger sequencing is still in use, primarily in clinical labs where it is acceptable to have low throughput, higher per-sample costs, and sequencing reads 500–1,000 bp in length.
But even after the Human Genome Project, the cost of automated Sanger sequencing — also known as capillary electrophoresis — remained too high to enable the kind of large-scale sequencing projects envisioned by scientists. By the mid-2000s, remarkable efforts had been made to bring down the costs of sequencing. Driven largely by grants from the National Human Genome Research Institute (NHGRI), labs around the world tested out new methods for higher-throughput sequencing, using concepts as diverse as electronics, physics, and magnetics.
Second generation sequencing — short reads become fast and efficient
One key player in the advent of next-generation sequencing (NGS) was a UK-based company called Solexa, which was later acquired by Illumina. The key innovation of the Illumina platform was “bridge amplification,” which allows the formation of dense clusters of amplified fragments across a silicon chip. Amplification of the original single molecule into a large cluster of many copies is what makes it possible to detect a fluorescent signal as a single dNTP is added one at a time, as sequencing proceeds by synthesis. Over time, the number of clusters that could be read simultaneously grew tremendously, and Illumina instruments became the first commercially available massively parallel sequencing technology. Other tools developed around the same time, such as the Ion Torrent platform, became part of the NGS landscape as well. NGS platforms are the dominant type of sequencing technology used today. Their extreme capacity allows for sequencing at very low cost. They are limited, however, in read length; NGS platforms typically produce reads of ~50–500 bp in length. This makes them an excellent fit for resequencing projects, SNP calling, and targeted sequencing of very short amplicons.
Third generation sequencing — the rise of long reads
However, short reads are not suitable for all sequencing projects. Another approach that was supported by the so-called $1,000 genome grants from NHGRI was Single Molecule, Real-Time (SMRT) sequencing from PacBio. This technique uses miniaturized wells, known as zero-mode waveguides, in which a single polymerase incorporates labeled nucleotides and light emission is measured in real time. A different single-molecule approach to long-read sequencing, using pore-forming proteins and electrical detection, was adopted by Oxford Nanopore Technologies (ONT).
Watch this short video to learn how SMRT sequencing works.
SMRT sequencing has a number of advantages. Most notable, perhaps, is its ability to produce long reads — tens of thousands of bases long — in a single read. These long reads make it possible to span large structural variants and challenging repetitive regions that confound short-read sequencers because their short snippets cannot be differentiated from each other during assembly. Another advantage is low GC bias, which allows PacBio systems to sequence through extreme-GC at AT regions that cannot be amplified during cluster generation on short read platforms. A third advantage is the ability to detect DNA methylations while sequencing, since no amplification is done on the instrument.
Short-read sequencing produces reads 50–500 base pairs in length, which can lead to sequence gaps and incomplete assemblies, known as draft genomes. Highly accurate long-read sequencing from PacBio produces reads tens of kilobases in length, creating overlaps which allow for the generation of complete genome assemblies.
As scientists began to work with SMRT sequencing — sometimes known as third-generation sequencing — they realized that it had particular value for applications including de novo genome sequencing, phasing, detection of structural variants, epigenetic characterization, and sequencing of the transcriptome without the need for assembly. Technology improvements over time increased the throughput and accuracy of SMRT sequencing platforms, bringing their costs in line with NGS platforms for many types of projects. Now, SMRT sequencing has industry-leading accuracy thanks to its HiFi sequencing, and it is being used around the world to produce reference-grade genomes for microbes, plants, animals, and people.
Explore other posts in the Sequencing 101 series:
Introduction to PacBio sequencing and the Sequel II system
From DNA to discovery — the steps of SMRT sequencing
Why are long reads important for studying viral genomes?
Understanding accuracy in DNA sequencing
The value of sequencing full-length RNA transcripts of DNA transcripts
Ploidy, haplotypes, and phasing — how to get more from your sequencing data
DNA extraction — tips, kits, and protocols
Video: Sequencing 101: how long-read sequencing improves access to genetic information