Many scientists are using PacBio Single Molecule, Real-Time (SMRT) Sequencing to explore the genomes and transcriptomes of a wide variety of marine species and ecosystems. These studies are already adding to our understanding of how marine species adapt and evolve, contributing to conservation efforts, and informing how we can optimize food production through efficient aquaculture.
Discover the benefits of HiFi reads and learn how highly accurate long-read sequencing provides a single technology solution across a range of applications.
Highly accurate long reads – HiFi reads – with single-molecule resolution make Single Molecule, Real-Time (SMRT) Sequencing ideal for full-length 16S rRNA sequencing, shotgun metagenomic profiling, and metagenome assembly.
Transformative Drug Research and Development via Accurate and Scalable Long-Read Sequencing Services
In this video Shawn Levy, Discovery Life Sciences’ Chief Scientific Officer, along with Cheryl Heiner, PacBio Principal Scientist, discuss the advantages of HudsonAlpha Discovery’s specialized sequencing services for PacBio HiFi…
In this talk, speakers provide an understanding HiFi sequencing methods for resolving viral diversity in complex systems, examples of how HiFi sequencing can phase entire viral genes or genomes, revealing…
Optimizing for Information: What Richer Data and Better Assemblies Reveal About Metagenome Structure and Function
In this talk, speakers provide an overview of PacBio-recommended tools for metagenome sequencing analysis, where to download example test data, the typical performance for HiFi metagenome sequencing of fecal samples,…
Background: Microbial ecology is reshaping our understanding of the natural world by revealing the large phylogenetic and functional diversity of microbial life. However the vast majority of these microorganisms remain poorly understood, as most cultivated representatives belong to just four phylogenetic groups and more than half of all identified phyla remain uncultivated. Characterization of this microbial ‘dark matter’ will thus greatly benefit from new metagenomic methods for in situ analysis. For example, sensitive high throughput methods for the characterization of community composition and structure from the sequencing of conserved marker genes. Methods: Here we utilize Single Molecule Real-Time (SMRT) sequencing of full-length 16S rRNA amplicons to phylogenetically profile microbial communities to below the genus-level. We test this method on a mock community of known composition, as well as a previously studied microbial community from a lake known to predominantly contain poorly characterized phyla. These results are compared to traditional 16S tag sequencing from short-read technologies and subsets of the full-length data corresponding to the same regions of the 16S gene. Results: We explore the benefits of using full-length amplicons for estimating community structure and diversity. In addition, we investigate the possible effects of context-specific and GC-content biases known to affect short-read sequencing technologies on the predicted community structure. We characterize the potential benefits of profiling metagenomic communities with full-length 16S rRNA genes from SMRT sequencing relative to standard methods.
SMRT Sequencing and assembly of the human microbiome project Mock Community sample – a feasibility project.
While the utility of Single Molecule, Real-Time (SMRT) Sequencing for de novo assembly and finishing of bacterial isolates is well established, this technology has not yet been widely applied to shotgun sequencing of microbial communities. In order to demonstrate the feasibility of this approach, we sequenced genomic DNA from the Microbial Mock Community B of the Human Microbiome Project
An interactive workflow for the analysis of contigs from the metagenomic shotgun assembly of SMRT Sequencing data.
The data throughput of next-generation sequencing allows whole microbial communities to be analyzed using a shotgun sequencing approach. Because a key task in taking advantage of these data is the ability to cluster reads that belong to the same member in a community, single-molecule long reads of up to 30 kb from SMRT Sequencing provide a unique capability in identifying those relationships and pave the way towards finished assemblies of community members. Long reads become even more valuable as samples get more complex with lower intra-species variation, a larger number of closely related species, or high intra-species variation. Here we present a collection of tools tailored for PacBio data for the analysis of these fragmented metagenomic assembles, allowing improvements in the assembly results, and greater insight into the communities themselves. Supervised classification is applied to a large set of sequence characteristics, e.g., GC content, raw-read coverage, k-mer frequency, and gene prediction information, allowing the clustering of contigs from single or highly related species. A unique feature of SMRT Sequencing data is the availability of base modification / methylation information, which can be used to further analyze clustered contigs expected to be comprised of single or very closely related species. Here we show base modification information can be used to further study variation, based on differences in the methylated DNA motifs involved in the restriction modification system. Application of these techniques is demonstrated on a monkey intestinal microbiome sample and an in silico mix of real sequencing data from distinct bacterial samples.
The assembly of metagenomes is dramatically improved by the long read lengths of SMRT Sequencing. This is demonstrated in an experimental design to sequence a mock community from the Human Microbiome Project, and assemble the data using the hierarchical genome assembly process (HGAP) at Pacific Biosciences. Results of this analysis are promising, and display much improved contiguity in the assembly of the mock community as compared to publicly available short-read data sets and assemblies. Additionally, the use of base modification information to make further associations between contigs provides additional data to improve assemblies, and to distinguish between members within a microbial community. The epigenetic approach is a novel validation method unique to SMRT Sequencing. In addition to whole-genome shotgun sequencing, SMRT Sequencing also offers improved classification resolution and reliability of metagenomic and microbiome samples by the full-length sequencing of 16S rRNA (~1500 bases long). Microbial communities can be detected at the species level in some cases, rather than being limited to the genus taxonomic classification as constrained by short-read technologies. The performance of SMRT Sequencing for these metagenomic samples achieved >99% predicted concordance to reference sequences in cecum, soil, water, and mock control investigations for bacterial 16S. Community samples are estimated to contain from 2.3 and up to 15 times as many species with abundance levels as low as 0.05% compared to the identification of phyla groups.
A workflow for the analysis of contigs from the metagenomic shotgun assembly of SMRT Sequencing data
The throughput of SMRT Sequencing and long reads allows microbial communities to be analyzed using a shotgun sequencing approach. Key to leveraging this data is the ability to cluster sequences belonging to the same member of a community. Long reads of up to 40 kb provide a unique capability in identifying those relationships, and pave the way towards finished assemblies of community members. Long reads are highly valuable when samples are more complex and containing lower intra-species variation, such as a larger number of closely related species, or high intra-species variation. Here, we present a collection of tools tailored for the analysis of PacBio metagenomic assemblies. These tools allow for improvements in the assembly results, and greater insight into the complexity of the study communities. Supervised classification is applied to a large set of sequence characteristics (e.g. GC content, raw read coverage, k-mer frequency, and gene prediction information) and to cluster contigs from single or highly related species. Assembly in isolation of the raw data associated with these contigs is shown to improve assembly statistics. A unique feature of SMRT Sequencing is the availability to leverage simultaneously collected base modification / methylation data to aid the clustering of contigs expected to comprise a single or very closely related species. We demonstrate the added value of base modification information to distinguish and study variation within metagenomic samples based on differences in the methylated DNA motifs involved in the restriction modification system. Application of these techniques is demonstrated on a mock community and monkey intestinal microbiome sample.
Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. A 2 kb SMRTbell library only requires a few ng of gDNA when carrier DNA is added to the library. The resulting libraries can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base-modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for the analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99% accuracy can be obtained from Circular Consensus Sequencing.
Despite apparent carbon limitation, anoxic deep subsurface brines at the Soudan Underground Iron Mine harbor active microbial communities. To characterize these assemblages, we performed shotgun metagenomics of native and enriched samples. Following enrichment on poised electrodes and long read sequencing, we recovered from the metagenome the closed, circular genome of a novel Desulfuromonas sp. with remarkable genomic features that were not fully resolved by short read assembly alone. This organism was essentially absent in unenriched Soudan communities, indicating that electrodes are highly selective for putative metal reducers. Native community metagenomes suggest that carbon cycling is driven by methyl-C1 metabolism, in particular methylotrophic methanogenesis. Our results highlight the promising potential for long reads in metagenomic surveys of low-diversity environments.
Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing.
There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments generally use short-read, second-generation sequencing, which results in data processing difficulties. For example, reads less than 1 kb in length will likely not cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT) Sequencing reads in the 1-2 kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells would generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community, since SMRT Sequencing has been shown to have no sequence-context bias. Long read lengths mean that that it would be reasonable to expect a high number of the reads to include gene fragments useful for analysis.
High-throughput sequencing of the complete 16S rRNA gene has become a valuable tool for characterizing microbial communities. However, the short reads produced by second-generation sequencing cannot provide taxonomic classification below the genus level. In this study, we demonstrate the capability of PacBio’s Single Molecule, Real-Time (SMRT) Sequencing to generate community profiles using mock microbial community samples from BEI Resources. We also evaluate multiplexing capabilities using PacBio barcodes on pooled samples comprising heterogeneous 16S amplicon populations representing soil, fecal, and mock communities.