Genome Biology Paper Highlights Affordability and Scale of PacBio-Based Finished Microbial Genomes
Monday, September 16, 2013
A new paper released in Genome Biology on September 13 from lead author Sergey Koren at the National Biodefense Analysis and Countermeasures Center offers a thorough overview of SMRT® Sequencing for microbes, from per-genome cost to potential for assembling complete genomes.
In “Reducing assembly complexity of microbial genomes with single-molecule sequencing,” Koren and co-authors consider microbial genome assembly, which evolved over time from the Sanger days of manually finished genomes to short-read sequencers that offered lots of sequence data but virtually no finished genomes. Today, that evolution has continued with SMRT® Sequencing, which allows for rapid and complete genome assembly. Less than a third of genomes in the Genomes OnLine Database are closed, according to the authors, and fewer still were considered fully finished. “This has hampered large-scale, structural analyses of bacterial genomes, and focused research instead on isolated genes and single-nucleotide polymorphisms,” Koren et al. write. Without knowledge of the full genome, microbiologists have been missing key information about pathogenicity, function, evolution, and more.
Single-molecule sequencing, on the other hand, provides a straightforward path to finished genomes, offering scientists the possibility for comparative genomic studies of many organisms, according to the paper. Indeed, based on an analysis of the repeat complexity of nearly 2,300 microbes, Koren et al. estimate that automated pipelines utilizing SMRT Sequencing “could automatically close >70% of the complete bacteria and archaea in GenBank, without the need for pair libraries….” To see how this would affect your favorite organism, check out paper co-author Adam Phillippy’s useful gap-prediction tool. Note: These results were based on a previous chemistry – using PacBio’s current chemistry with longer read lengths, the closure rate would be even higher.
The authors also looked closely at cost implications of finishing microbial genomes in this manner. Relying on a single, PacBio-only library preparation (instead of two or more preps required for hybrid assemblies) keeps costs down and accuracy up; “single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads,” Koren et al. write. They note that closing more than 70% of known microbial genomes could be accomplished for about $900 per genome — or two SMRT Cells per organism using the PacBio RS II.
As part of this project, the scientists sequenced six genomes with varying GC content and complexity using Illumina®, 454®, and PacBio sequencers. For nearly all of the genomes, the PacBio-only assemblies “outperformed the hybrid assemblies both in terms of continuity, error rate, and the assembly likelihood score,” the authors write. “The assemblies presented here have good likelihood and finished-grade consensus accuracy exceeding 99.9999%.”
Koren and his collaborators note that with an affordable, high-quality approach to fully sequencing and finishing microbial genomes, there is now opportunity to “increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pangenomes and chromosomal organization.”