A paper recently published in Nature Methods offers a deep dive into the use of our HGAP and Quiver tools to generate a high-quality genome assembly with an automated, simplified workflow. (“Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data,” Chin et al., advance online publication.)
The publication, which includes lead author Chen-Shan Chin and others at Pacific Biosciences as well as collaborators at the Joint Genome Institute and the Eichler lab at the University of Washington, uses Single Molecule, Real-Time (SMRT®) Sequencing on three microorganisms and one human BAC to compare PacBio-only sequencing to existing high-quality reference genomes.
The aim of the work was to evaluate the utility of PacBio’s automated hierarchical genome-assembly process (HGAP) as well as the consensus–calling algorithm Quiver in producing high-quality genomes. In contrast to previously published hybrid assemblies using PacBio long reads in combination with short-read sequences from other platforms, the work done for this paper relied solely on sequences generated from the PacBio instrument. “Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads, which we follow with assembly using off-the-shelf long-read assemblers,” the authors report.
In the paper, the authors describe the sequencing of three well-studied microorganisms (E. coli K-12, Meiothermus ruber, and Pedobacter heparinus) and a human BAC from chromosome 15. Each sample was prepared in a single library, sequenced, and assembled using HGAP, using the Quiver algorithm to polish for consensus accuracy. Together, these approaches yielded de novo genome sequence with accuracy better than 99.999%. The authors write, “In each case, SMRT sequencing of a single, large-insert template library to ~80×–100× coverage followed by HGAP analysis resulted in excellent de novo genome assemblies that are comparable in quality and contiguity to assemblies generated by Sanger sequencing or by hybrid approaches combining long- and short-read sequencing methods.”
This approach offers a simplified workflow — scientists do not have to prepare multiple libraries as they do for hybrid sequencing methods — and high-quality results that do not require the use of a reference genome. This facilitates de novo genome sequencing, which the authors note is particularly important in microbial studies. “Certain aspects of microbial diversity … can be overlooked [in resequencing projects] because the sequence reads that do not align to the chosen reference genome are removed from consideration. In many cases, it is those sequences that provide critical insights into what makes certain bacterial strains different from their reference strains,” the authors state. With the efficient, accurate approach described in this Nature Methods paper, scientists can instead perform de novo sequencing to get a more complete picture of an organism under investigation.