Scientists from the University of Hong Kong recently reported results of a head-to-head comparison of long-read and short-read platforms for sequencing and assembly of a bacterial genome. They determined that only SMRT Sequencing was capable of generating highly accurate, complete assemblies. “Completing bacterial genomes should no longer be regarded as a luxury, but rather as a cost-effective necessity,” the team reports.
“PacBio But Not Illumina Technology Can Achieve Fast, Accurate and Complete Closure of the High GC, Complex Burkholderia pseudomallei Two-Chromosome Genome” was published in Frontiers in Microbiology by lead author Jade Teng, senior author Patrick Woo, and collaborators. For this project, scientists compared performance of the PacBio RS II Sequencing System with the Illumina HiSeq 1500. Their target was Burkholderia pseudomallei, which has at least 68% GC content as well as “highly repetitive regions and substantial genomic diversity,” the authors report.
After sequencing, the team attempted both hybrid and single-source assemblies. Working with Illumina data alone “resulted in a draft genome with more than 200 contigs,” they note, pointing out that the platform’s reliance on PCR amplification is inherently problematic for GC-rich genomes. Three different short-read assemblers were not able to improve results. The hybrid assembly of both sequencers’ data was also “not successful,” producing 74 contigs, the team reports.
Assembling only PacBio data, which was generated from a single SMRT Cell, led to a very different result. The approach “achieved complete closure of this two-chromosome B. pseudomallei genome without additional costly bench work and further sequencing, demonstrating its utility in the complete sequencing of bacterial genomes, particularly those that are well-known to be difficult-to-sequence,” the scientists write. The chromosome contigs of the assembly aligned to the organism’s reference genome with better than 99.9% accuracy. Importantly, the assembly accurately characterized “the number of CDSs and their distributions in each subsystem, four ribosomal operons, the highest number of core and virulence proteins (coverage of query protein sequence and amino acid identity ≥80%), and MLST gene loci,” the team adds.
The Illumina assembly, on the other hand, was unable to resolve these elements. “Extraordinarily high coverage of Illumina reads were observed in several collapsed repeat regions, including regions containing varying copies of mobile element proteins and ribosomal operon,” Teng et al. report. “We reasoned that Illumina sequencing was not able to resolve these repeat regions as their sequence reads were not long enough to span different kinds of repeats with unique flanking sequences.”
The scientists also included an assessment of project cost. “To completely sequence a bacterial genome using Sanger sequencing or the second generation sequencing platforms, the main bulk of the cost, labor and time is spent in the gap-filling phase,” they write. “It has been estimated that when using these second generation sequencing platforms, around 95% of the money and time are spent in completing the last 1% of the bacterial genome.” But the calculation is very different for SMRT Sequencing. “Although the cost per base is more expensive for the PacBio RS II platform compared to short-read sequencing technology, no additional manual work after de novo assembly is required,” the team concludes, “and the benefit of obtaining an accurate number of individual replicons and an intact assembly of repetitive regions and mobile genetic elements justify the initial cost.”
August 15, 2017 | General