June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities, spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments require a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-2 kb range, with >99% consensus accuracy, can be efficiently generated for low amounts of input DNA, e.g. as little as 10 ng of input DNA sequenced in 4 SMRT Cells can generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community. Long read lengths translate to a high number of the reads harboring full genes or even full operons for downstream analysis. Here we present the results of circular-consensus sequencing on a mock metagenomic community with an abundance range of multiple orders of magnitude, and compare the results with both 16S and shotgun assembly methods. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows to elucidate meaningful information from the very low-abundance community members. For example, given the above low-input sequencing approach, a community member at 1/1,000 relative abundance would generate 100 1-2 kb sequence fragments having 99% consensus accuracy, with a high probability of containing a gene fragment useful for taxonomic classification or functional insight.


June 1, 2021  |  

Profiling the microbiome in fecal microbiota transplantation using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT®) Sequencing reads in the 1-3kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells on the PacBio RS II would generate >100,000 such reads. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With reads >1 kb at >99% accuracy it is reasonable to expect a high percentage of reads include gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


June 1, 2021  |  

Low-input long-read sequencing for complete microbial genomes and metagenomic community analysis

Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. The resulting library can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99.9% accuracy can be obtained from Circular Consensus Sequencing.


June 1, 2021  |  

Minimization of chimera formation and substitution errors in full-length 16S PCR amplification

The constituents and intra-communal interactions of microbial populations have garnered increasing interest in areas such as water remediation, agriculture and human health. One popular, efficient method of profiling communities is to amplify and sequence the evolutionarily conserved 16S rRNA sequence. Currently, most targeted amplification focuses on short, hypervariable regions of the 16S sequence. Distinguishing information not spanned by the targeted region is lost and species-level classification is often not possible. SMRT Sequencing easily spans the entire 1.5 kb 16S gene, and in combination with highly-accurate single-molecule sequences, can improve the identification of individual species in a metapopulation. However, when amplifying a mixture of sequences with close similarities, the products may contain chimeras, or recombinant molecules, at rates as high as 20-30%. These PCR artifacts make it difficult to identify novel species, and reduce the amount of productive sequences. We investigated multiple factors that have been hypothesized to contribute to chimera formation, such as template damage, denaturing time before and during cycling, polymerase extension time, and reaction volume. Of the factors tested, we found two major related contributors to chimera formation: the amount of input template into the PCR reaction and the number of PCR cycles. Sequence errors generated during amplification and sequencing can also confound the analysis of complex populations. Circular Consensus Sequencing (CCS) can generate single-molecule reads with >99% accuracy, and the SMRT Analysis software provides filtering of these reads to >99.99% accuracies. Remaining substitution errors in these highly-filtered reads are likely dominated by mis-incorporations during amplification. Therefore, we compared the impact of several commercially-available high-fidelity PCR kits with full-length 16S amplification. We show results of our experiments and describe an optimized protocol for full-length 16S amplification for SMRT Sequencing. These optimizations have broader implications for other applications that use PCR amplification to phase variations across targeted regions and to generate highly accurate reference sequences.


June 1, 2021  |  

Minimization of chimera formation and substitution errors in full-length 16S PCR amplification

The constituents and intra-communal interactions of microbial populations have garnered increasing interest in areas such as water remediation, agriculture and human health. Amplification and sequencing of the evolutionarily conserved 16S rRNA gene is an efficient method of profiling communities. Currently, most targeted amplification focuses on short, hypervariable regions of the 16S sequence. Distinguishing information not spanned by the targeted region is lost, and species-level classification is often not possible. PacBio SMRT Sequencing easily spans the entire 1.5 kb 16S gene in a single read, producing highly accurate single-molecule sequences that can improve the identification of individual species in a metapopulation.However, this process still relies upon PCR amplification from a mixture of similar sequences, which may result in chimeras, or recombinant molecules, at rates upwards of 20%. These PCR artifacts make it difficult to identify novel species, and reduce the amount of informative sequences. We investigated multiple factors that may contribute to chimera formation, such as template damage, denaturation time before and during thermocycling, polymerase extension time, and reaction volume. We found two related factors that contribute to chimera formation: the amount of input template into the PCR reaction, and the number of PCR cycles.A second problem that can confound analysis is sequence errors generated during amplification and sequencing. With the updated algorithm for circular consensus sequencing (CCS2), single-molecule reads can be filtered to 99.99% predicted accuracy. Substitution errors in these highly filtered reads may be dominated by mis-incorporations during amplification. Sequence differences in full-length 16S amplicons from several commercial high-fidelity PCR kits were compared.We show results of our experiments and describe our optimized protocol for full-length 16S amplification for SMRT Sequencing. These optimizations have broader implications for other applications that use PCR amplification to phase variations across targeted regions and generate highly accurate reference sequences.


June 1, 2021  |  

Workflow for processing high-throughput, Single Molecule, Real-Time Sequencing data for analyzing the microbiome of patients undergoing fecal microbiota transplantation

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500 bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-3 kb range, with >99% accuracy can be generated using the previous generation PacBio RS II or, in much higher throughput, using the new Sequel System. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With single-molecule reads >1 kb at >99% consensus accuracy, it is reasonable to expect a high percentage of reads to include genes or gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


June 1, 2021  |  

WGS SMRT Sequencing of patient samples from a fecal microbiota transplant trial

Fecal samples were obtained from human subjects in the first blinded, placebo-controlled trial to evaluate the efficacy and safety of fecal microbiota transplant (FMT) for treatment of recurrent C. difficile infection. Samples included pre-and post-FMT transplant, post-placebo transplant, and the donor control; samples were taken at 2 and 8 week post-FMT. Sequencing was done on the PacBio Sequel System, with the goal of obtaining high quality sequences covering whole genes or gene clusters, which will be used to better understand the relationship between the composition and functional capabilities of intestinal microbiomes and patient health. Methods: Samples were randomly sheared to 2-3 kb fragments, a sufficient length to cover most genes, and SMRTbell libraries were prepared using standard protocols. Libraries were run on the Sequel System, which has a throughput of hundreds of thousands of reads per SMRT Cell, adequate yield to sample the complex microbiomes of post-transplant and donor samples.Results: Here we characterize samples, describe library prep methods and detail Sequel System operation, including run conditions. Descriptive statistics of data output (primary analysis) are presented, along with SMRT Analysis reports on circular consensus sequence (CCS) reads generated using an updated algorithm (CCS2). Final sequencing yields are filtered at various levels of predicted accuracy from 90% to 99.9%. Previous studies done using the PacBio RS II System demonstrated the ability to profile at the species level, and in some cases the strain level, and provided functional insight. Conclusions: These results demonstrate that the Sequel System is well-suited for characterization of complex microbial communities, with the ability for high-throughput generation of extremely accurate single-molecule sequences, each several kilobases in length. The entire process from shearing and library prep through sequencing and CCS analysis can be completed in less than 48 hours.


June 1, 2021  |  

Profiling complex population genomes with highly accurate single molecule reads: cow rumen microbiomes

Determining compositions and functional capabilities of complex populations is often challenging, especially for sequencing technologies with short reads that do not uniquely identify organisms or genes. Long-read sequencing improves the resolution of these mixed communities, but adoption for this application has been limited due to concerns about throughput, cost and accuracy. The recently introduced PacBio Sequel System generates hundreds of thousands of long and highly accurate single-molecule reads per SMRT Cell. We investigated how the Sequel System might increase understanding of metagenomic communities. In the past, focus was largely on taxonomic classification with 16S rRNA sequencing. Recent expansion to WGS sequencing enables functional profiling as well, with the ultimate goal of complete genome assemblies. Here we compare the complex microbiomes in 5 cow rumen samples, for which Illumina WGS sequence data was also available. To maximize the PacBio single-molecule sequence accuracy, libraries of 2 to 3 kb were generated, allowing many polymerase passes per molecule. The resulting reads were filtered at predicted single-molecule accuracy levels up to 99.99%. Community compositions of the 5 samples were compared with Illumina WGS assemblies from the same set of samples, indicating rare organisms were often missed with Illumina. Assembly from PacBio CCS reads yielded a contig >100 kb in length with 6-fold coverage. Mapping of Illumina reads to the 101 kb contig verified the PacBio assembly and contig sequence. These results illustrate ways in which long accurate reads benefit analysis of complex communities.


June 1, 2021  |  

Profiling complex communities with highly accurate single molecule reads: cow rumen microbiomes

Determining compositions and functional capabilities of complex populations is often challenging, especially for sequencing technologies with short reads that do not uniquely identify organisms or genes. Long-read sequencing improves the resolution of these mixed communities, but adoption for this application has been limited due to concerns about throughput, cost and accuracy. The recently introduced PacBio Sequel System generates hundreds of thousands of long and highly accurate single-molecule reads per SMRT Cell. We investigated how the Sequel System might increase understanding of metagenomic communities. In the past, focus was largely on taxonomic classification with 16S rRNA sequencing. Recent expansion to WGS sequencing enables functional profiling as well, with the ultimate goal of complete genome assemblies. Here we compare the complex microbiomes in 5 cow rumen samples, for which Illumina WGS sequence data was also available. To maximize the PacBio single-molecule sequence accuracy, libraries of 2 to 3 kb were generated, allowing many polymerase passes per molecule. The resulting reads were filtered at predicted single-molecule accuracy levels up to 99.99%. Community compositions of the 5 samples were compared with Illumina WGS assemblies from the same set of samples, indicating rare organisms were often missed with Illumina. Assembly from PacBio CCS reads yielded a contig >100 kb in length with 6-fold coverage. Mapping of Illumina reads to the 101 kb contig verified the PacBio assembly and contig sequence. Scaffolding with reads from a PacBio unsheared library produced a complete genome of 2.4 Mb. These results illustrate ways in which long accurate reads benefit analysis of complex communities.


June 1, 2021  |  

Using the PacBio Sequel System to taxonomically and functionally classify metagenomic samples in a trial of patients undergoing fecal microbiota transplantation

Whole-sample shotgun sequencing can provide a more detailed view of a metagenomic community than 16S sequencing, but its use in multi-sample experiments is limited by throughput, cost and analysis complexity. While short-read sequencing technologies offer higher throughput, read lengthss less fewer than 500 bp will rarely cover a gene of interest, and necessitate assembly before further analysis. Assembling large fragments requires sampling each community member at a high depth, significantly increasing the amount of sequencing needed, and limiting the analysis of rare community members. Assembly methods also risk It is also possible to incorrectly combine combining sequences from different community members.


June 1, 2021  |  

Applying Sequel to Genomic Datasets

De novo assembly is a large part of JGI’s analysis portfolio. Repetitive DNA sequences are abundant in a wide range of organisms we sequence and pose a significant technical challenge for assembly. We are interested in long read technologies capable of spanning genomic repeats to produce better assemblies. We currently have three RS II and two Sequel PacBio machines. RS II machines are primarily used for fungal and microbial genome assembly as well as synthetic biology validation. Between microbes and fungi we produce hundreds of PacBio libraries a year and for throughput reasons the vast majority of these are >10 kb AMPure libraries. Throughput for RS II is about 1 Gb per SMRT Cell. This is ideal for microbial sized genomes but can be costly and labor intensive for larger projects which require multiple cells. JGI was an early access site for Sequel and began testing with real samples in January 2016. During that time we’ve had the opportunity to sequence microbes, fungi, metagenomes, and plants. Here we present our experience over the last 18 months using the Sequel platform and provide comparisons with RS II results.


June 1, 2021  |  

Full-length cDNA sequencing of prokaryotic transcriptome and metatranscriptome samples

Next-generation sequencing has become a useful tool for studying transcriptomes. However, these methods typically rely on sequencing short fragments of cDNA, then attempting to assemble the pieces into full-length transcripts. Here, we describe a method that uses PacBio long reads to sequence full-length cDNAs from individual transcriptomes and metatranscriptome samples. We have adapted the PacBio Iso-Seq protocol for use with prokaryotic samples by incorporating RNA polyadenylation and rRNA-depletion steps. In conjunction with SMRT Sequencing, which has average readlengths of 10-15 kb, we are able to sequence entire transcripts, including polycistronic RNAs, in a single read. Here, we show full-length bacterial transcriptomes with the ability to visualize transcription of operons. In the area of metatranscriptomics, long reads reveal unambiguous gene sequences without the need for post-sequencing transcript assembly. We also show full-length bacterial transcripts sequenced after being treated with NEB’s Cappable-Seq, which is an alternative method for depleting rRNA and enriching for full-length transcripts with intact 5’ ends. Combining Cappable-Seq with PacBio long reads allows for the detection of transcription start sites, with the additional benefit of sequencing entire transcripts.


June 1, 2021  |  

High-resolution evaluation of gut microbiota associated with intestinal maturation in early preterm neonates

Leaky gut, or intestinal barrier immaturity with elevated intestinal permeability, is the proximate cause of susceptibility to necrotizing enterocolitis in preterm neonates. We recently revealed intestinal barrier maturation was associated with exclusive breastfeeding, less antibiotic exposure, most importantly, altered composition of the gut microbiota. However, sequencing short regions of 16S rRNA gene amplicon failed to identify the specific bacterial groups associated with improved or aberrant intestinal permeability. In this study, we performed high-throughput amplicon sequencing of the full length 16S rRNA gene with single-nucleotide resolution for a cohort of 66 preterm neonates born at 24-33 weeks of gestation who had stool collected daily for 21 postnatal days. We assessed their intestinal permeability by measuring urine non-metabolized sugar probes lactulose and rhamnose during the first 7-10 days of life. We observed that intestinal barrier maturation was positively correlated with changes in specific amplicon sequence variants of species of Clostridiales and Bifidobacterium, while leaky gut was associated with specific strains of Escherichia coli. These results are promising in that they support the use of stool microbial biomarkers for the rapid, non-invasive, and cost-effective assessment of intestinal maturation in neonates.


June 1, 2021  |  

Comparison of sequencing approaches applied to complex soil metagenomes to resolve proteins of interest

Background: Long-read sequencing presents several potential advantages for providing more complete gene profiling of metagenomic samples. Long reads can capture multiple genes in a single read, and longer reads typically result in assemblies with better contiguity, especially for higher abundance organisms. However, a major challenge with using long reads has been the higher cost per base, which may lead to insufficient coverage of low-abundance species. Additionally, lower single-pass accuracy can make gene discovery for low-abundance organisms difficult. Methods: To evaluate the pros and cons of long reads for metagenomics, we directly compared PacBio and Illumina sequencing on a soil-derived sample, which included spike-in controls of known concentrations of pure referenced samples. For PacBio sequencing, a 10 kb library was sequenced on the Sequel System with 3.0 chemistry. Highly accurate long reads (HiFi reads) with Q20 and higher were generated for downstream analyses using PacBio Circular Consensus Sequencing (CCS) mode. Results were assessed according to the following criteria: DNA extraction capacity, bioinformatics pipeline status, % of proteins with ambiguous AA’s, total unique error-free genes/$1000, total proteins observed in spike-ins/$1000, proteins of interest/$1000, median length of contigs with proteins, and assembly requirements. Results: Both methods had areas of superior performance. DNA extraction capacity was higher for Illumina, the bioinformatics pipeline is well-tested, and there was a lower proportion of proteins with ambiguous AA’s. On the other hand, with PacBio, twice as many unique error-free genes, twice as many total proteins from spike-ins, and ~6 times more proteins of interest were found per $1000 cost. PacBio data produced on average 5 times longer contigs capturing proteins of interest. Additionally, assembly was not required for gene or protein finding, as was the case with Illumina data. Conclusions: In this comparison of PacBio Sequel System with Illumina NextSeq on a complex microbiome, we conclude that the sequencing system of choice may vary, depending on the goals and resources for the project. PacBio sequencing requires a longer DNA extraction method, and the bioinformatics pipeline may require development. On the other hand, the Sequel System generates hundreds of thousands of long HiFi reads per SMRT Cell, producing more genes, more proteins, and longer contigs, thereby offering more information about the metagenomic samples for a lower cost.


June 1, 2021  |  

Unbiased characterization of metagenome composition and function using HiFi sequencing on the PacBio Sequel II System

Recent work comparing metagenomic sequencing methods indicates that a comprehensive picture of the taxonomic and functional diversity of complex communities will be difficult to achieve with short-read technology alone. While the lower cost of short reads has enabled greater sequencing depth, the greater contiguity of long-read assemblies and lack of GC bias in SMRT Sequencing has enabled better gene finding. However, since long-read assembly requires high coverage for error correction, the benefits of unbiased coverage have in the past been lost for low abundance species. SMRT Sequencing performance improvements and the introduction of the Sequel II System has enabled a new, high throughput data type uniquely suited to metagenome characterization: HiFi reads. HiFi reads combine high accuracy with read lengths up to 15 kb, eliminating the need for assembly for most microbiome applications, including functional profiling, gene discovery, and metabolic pathway reconstruction. Here we present the application of the HiFi data type to enable a new method of analyzing metagenomes that does not require assembly.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.