TB Study Finds Some Previously Reported Virulence Variants Were Sequencing Errors
Thursday, May 18, 2017
A publication in BMC Genomics upends some of the conventional wisdom about variants that may cause virulence in Mycobacterium tuberculosis. Scientists at San Diego State University used SMRT Sequencing to produce a complete assembly of the pathogen, finding that earlier assemblies encountered problems due to GC bias and repetitive DNA.
“SMRT genome assembly corrects reference errors, resolving the genetic basis of virulence in Mycobacterium tuberculosis” comes from Afif Elghraoui, Samuel Modlin, and Faramarz Valafar. The team used long-read PacBio sequencing on an attenuated strain of M. tuberculosis, which is often compared to a virulent strain to highlight sources of pathogenicity. The same strain was previously sequenced with Sanger technology and published in 2008.
The sequencing process required just two SMRT Cells to achieve an average of 217-fold coverage. Assembly resulted in a single contig. Later, the scientists went back to the data and found that the same sequence results were achieved using results from only one of the SMRT Cells. A comparison of the new assembly to the previous one, as well as to a reference assembly of the virulent M. tuberculosis strain, found that the Sanger assembly overstated the genetic differences between the two microbes.
“Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies,” the authors report. Many of the previous sequencing errors were found in genes known to be repetitive and GC-rich. “Our results constrain the set of genomic differences possibly affecting virulence by more than half, which focuses laboratory investigation on pertinent targets and demonstrates the power of SMRT sequencing for producing high-quality reference genomes,” they add.
Elghraoui et al. note that SMRT Sequencing offers significant advantages in accuracy and read length. “The random error profile of this technology allows for consensus accuracy to increase as a function of sequencing depth,” they write, reporting a QV greater than 60 for their assembly. In addition, the long reads “allowed us to easily and unambiguously capture known structural variants in H37Ra, as well as two novel to the strain.”
These results lead the authors to “advise caution when analyzing GC-rich and repetitive sequences among reference genomes, not to mention draft genomes,” they write. “As de novo assembly can be routinely performed for microbes using single-molecule sequencing, we strongly recommend this for mycobacteria.”
Microbiology fans can find the PacBio team at the upcoming ASM conference in booth #1328.