Full-Length 16S Sequencing Offers Better Phylogenetic Resolution
Wednesday, April 20, 2016
Scientists from the Joint Genome Institute and other institutions recently reported a new SMRT Sequencing approach to microbial profiling using full-length sequencing of the 16S rRNA gene. In a benchmarking study, they demonstrate that this method allows for more accurate taxonomic classification than is possible with typical short-read sequencing methods.
Lead author Esther Singer, senior author Tanja Woyke, and collaborators at USDA-ARS, the University of British Columbia, and other research groups published “High-resolution phylogenetic microbial community profiling” in The ISME Journal earlier this year. The scientists note that while 16S phylogenetic analysis has traditionally been performed with gold-quality Sanger sequencing, the need for a more cost-effective solution drove the field to short-read sequencing technologies, which have produced most of the 16S sequences in GenBank. However, that shift came at the cost of quality. “Reference sequences with low read accuracy, chimeric sequences and partial rRNA gene sequences with reduced phylogenetic resolution generated on short-read sequencing platforms such as 454 and Illumina remain problematic, resulting in incorrect or less accurate classification of environmental sequences,” the authors report.
The team thought long reads from SMRT Sequencing could provide an appealing alternative. In this project, they generated full-length 16S sequences from microbial communities using a PacBio instrument and compared results to those from a short-read platform. They first tested the approach on a mock community of 26 bacterial and archaeal species including E. coli and strains of Salmonella and Clostridium, generating full-length 16S sequences called PhyloTags in a successful validation of the method.
Next they went to the field, using PacBio and short-read sequencing to analyze microbial communities from a lake in British Columbia, with water samples taken at eight different depths. They determined that partial sequences from the 16S gene — the information generated by sequencers that can’t cover the full gene in a single read — were less likely to resolve phylogeny and were more likely to lead to incorrect matches, particularly in more complex microbial communities. As many as 4% of short-read results “were taxonomically unresolved at the phylum level, whereas all PhyloTags were classified into distinct bacterial phyla,” the scientists report. In an analysis of unclustered sequence data, they note that short-read sequence results were “more often either impossible or incorrect, significantly altering community profiles across all taxonomic levels.” They also found that certain phyla were more likely to be misclassified when only partial gene coverage was available. “PhyloTag sequencing … offers the highest contig accuracy without discrimination against GC-rich or -poor regions, which further reduces bias in amplicon-based profiling,” the authors write.
“A resurgence of [full-length] sequences used as ‘gold standards’ has the potential to yet again transform microbial community studies, increasing the accuracy of taxonomic assignments for known and novel branches in the tree of life on previously unobtainable scales,” Singer et al. report.