In Assembler Evaluation, Scientists Recommend Non-hybrid Approach to Bacterial Genomes
Tuesday, June 2, 2015
A new publication in Nature Scientific Reports recommends using only the PacBio® system to sequence bacterial genomes for the best chance of generating an accurate and finished assembly.
The paper, “Completing bacterial genome assemblies: strategy and performance comparisons,” reviews several different long-read assembly methods for bacterial genomes. Authors Yu-Chieh Liao, Shu-Hung Lin, and Hsin-Hung Lin from the Institute of Population Health Sciences in Taiwan note that while several methods exist, efforts to evaluate and compare them have been insufficient. They set out to thoroughly assess these methods, which include hybrid assembly protocols as well as long-read-only protocols.
Long-read technology appealed to the authors because short-read sequencers have produced assemblies that “are often unfinished, fragmented draft genomes,” Liao et al. write. In their analysis of long-read assembly algorithms, Liao and colleagues collected data sets from various sources to enable a more direct comparison of the tools. Their assessment covered several hybrid assemblers (ALLPATHS-LG, Celera Assembler’s PacBio corrected reads pipeline, SPAdes, and SSPACE-LongRead) as well as two long-read-only assemblers (HGAP and Celera Assembler’s PacBio corrected reads pipeline via self-correction).
None of the hybrid-assembly methods consistently produced single-contig assemblies for bacterial genomes, the authors report. In one example, they show that data from four SMRT® Cells and short-read data could not be assembled into a finished contig using PBcR from Celera Assembler, while the data from the four SMRT Cells alone was successfully assembled into just one contig using the same algorithm without short-read data.
The non-hybrid approaches were more consistently successful. “The latest version of non-hybrid approaches rapidly produced accurate and complete bacterial [genomes],” the authors write. HGAP 3.0 was able to assemble an E. coli genome in less than two and a half hours, while the latest version of PBcR’s self-corrected pipeline took only 24 minutes. Quiver polishing added an extra hour to the PBcR method, but led to a consensus accuracy of 99.9997%.
“The non-hybrid approach relying on single-library preparation is the preferred way to de novo assemble and thereby complete bacterial genomes,” Liao et al. write. “We therefore recommended the practitioners to exclusively sequence their bacterial genomes using the PacBio RS II system.”
Don’t forget about our SMRTest Microbe Grant Program which will enable one lucky winner to get his or her project sequenced on the PacBio system with up to four library preparations and a sequencing run using a maximum of one SMRT Cell 8Pac. Our grant program co-sponsor, the Institute for Genome Sciences, will provide sequencing and bioinformatics support. Entries are due by June 27.