June 1, 2021  |  

Harnessing kinetic information in Single-Molecule, Real-Time Sequencing.

Single-Molecule Real-Time (SMRT) DNA sequencing is unique in that nucleotide incorporation events are monitored in real time, leading to a wealth of kinetic information in addition to the extraction of the primary DNA sequence. The dynamics of the DNA polymerase that is observed adds an additional dimension of sequence-dependent information, and can be used to learn more about the molecule under study. First, the primary sequence itself can be determined more accurately. The kinetic data can be used to corroborate or overturn consensus calls and even enable calling bases in problematic sequence contexts. Second, using the kinetic information, we can detect and discriminate numerous chemical base modifications as a by-product of ordinary sequencing. Examples of applying these capabilities include (i) the characterization of the epigenome of microorganisms by directly sequencing the three common prokaryotic epigenetic base modifications of 4-methylcytosine, 5- methylcytosine and 6-methyladenine; (ii) the characterization of known and novel methyltransferase activities; (iii) the direct sequencing and differentiation of the four eukaryotic epigenetic forms of cytosine (5-methyl, 5-hydroxymethyl, 5-formyl, and 5-carboxylcytosine) with first applications to map them with single base-pair and DNA strand resolution across mammalian genomes; (iv) the direct sequencing and identification of numerous modified DNA bases arising from DNA damage; and (v) an exploration of the mitochondrial genome for known and novel base modifications. We will show our progress towards a generic, open-source algorithm for exploiting kinetic information for any of these purposes.


June 1, 2021  |  

New discoveries from closing Salmonella genomes using Pacific Biosciences continuous long reads.

The newer hierarchical genome assembly process (HGAP) performs de novo assembly using data from a single PacBio long insert library. To assess the benefits of this method, DNA from several Salmonella enterica serovars was isolated from a pure culture. Genome sequencing was performed using Pacific Biosciences RS sequencing technology. The HGAP process enabled us to close sixteen Salmonella subsp. enterica genomes and their associated mobile elements: The ten serotypes include: Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis) S. Bareilly, S. Heidelberg, S. Cubana, S. Javiana and S. Typhimurium, S. Newport, S. Montevideo, S. Agona, and S. Tennessee. In addition, we were able to detect novel methyltransferases (MTases) by using the Pacific Biosciences kinetic score distributions showing that each serovar appears to have a novel methylation pattern. For example while all Salmonella serovars examined so far have methylase specific activity for 5’-GATC-3’/3’-CTAG-5’ and 5’-CAGAG-3’/3’-GTCTC-5’ (underlined base indicates a modification), S. Heidelberg is uniquely specific for 5’-ACCANCC-3’/3’-TGGTNGG-5’, while S. Typhimurium has uniquely methylase specific for 5′-GATCAG-3’/3′- CTAGTC-5′ sites, for the samples examined so far. We believe that this may be due to the unique environments and phages that these serotypes have been exposed to. Furthermore, our analysis identified and closed a variety of plasmids such as mobilization plasmids, antimicrobial resistance plasmids and IncX plasmids carrying a Type IV secretion system (T4SS). The VirB/D4 T4SS apparatus is important in that it assists with rapid dissemination of antibiotic resistance and virulence determinants. Presently, only limited information exists regarding the genotypic characterization of drug resistance in S. Heidelberg isolates derived from various host species. Here, we characterize two S. Heidelberg outbreak isolates from two different outbreaks. Both isolates contain the IncX plasmid of approximately 35 kb, and carried the genes virB1, virB2, virB3/4, virB5, virB6, virB7, virB8, virB9, virB10, virB11, virD2, and virD4, that are associated with the T4SS. In addition, the outbreak isolate associated with ground turkey carries a 4,473 bp mobilization plasmid and an incompatibility group (Inc) I1 antimicrobial resistance plasmid encoding resistance to gentamicin (aacC2), beta-lactam (bl2b_tem), streptomycin (aadAI) and tetracycline (tetA, tetR) while the outbreak isolate associated with chicken breast carries the IncI1 plasmid encoding resistance to gentamicin (aacC2), streptomycin (aadAI) and sulfisoxazole (sul1). Using this new technology we explored the genetic elements present in resistant pathogens which will achieve a better understanding of the evolution of Salmonella.


June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing.

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments generally use short-read, second-generation sequencing, which results in data processing difficulties. For example, reads less than 1 kb in length will likely not cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT) Sequencing reads in the 1-2 kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells would generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community, since SMRT Sequencing has been shown to have no sequence-context bias. Long read lengths mean that that it would be reasonable to expect a high number of the reads to include gene fragments useful for analysis.


June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities, spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments require a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-2 kb range, with >99% consensus accuracy, can be efficiently generated for low amounts of input DNA, e.g. as little as 10 ng of input DNA sequenced in 4 SMRT Cells can generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community. Long read lengths translate to a high number of the reads harboring full genes or even full operons for downstream analysis. Here we present the results of circular-consensus sequencing on a mock metagenomic community with an abundance range of multiple orders of magnitude, and compare the results with both 16S and shotgun assembly methods. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows to elucidate meaningful information from the very low-abundance community members. For example, given the above low-input sequencing approach, a community member at 1/1,000 relative abundance would generate 100 1-2 kb sequence fragments having 99% consensus accuracy, with a high probability of containing a gene fragment useful for taxonomic classification or functional insight.


June 1, 2021  |  

Profiling the microbiome in fecal microbiota transplantation using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT®) Sequencing reads in the 1-3kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells on the PacBio RS II would generate >100,000 such reads. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With reads >1 kb at >99% accuracy it is reasonable to expect a high percentage of reads include gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


June 1, 2021  |  

Low-input long-read sequencing for complete microbial genomes and metagenomic community analysis

Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. The resulting library can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99.9% accuracy can be obtained from Circular Consensus Sequencing.


June 1, 2021  |  

Workflow for processing high-throughput, Single Molecule, Real-Time Sequencing data for analyzing the microbiome of patients undergoing fecal microbiota transplantation

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500 bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-3 kb range, with >99% accuracy can be generated using the previous generation PacBio RS II or, in much higher throughput, using the new Sequel System. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With single-molecule reads >1 kb at >99% consensus accuracy, it is reasonable to expect a high percentage of reads to include genes or gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


June 1, 2021  |  

Profiling complex population genomes with highly accurate single molecule reads: cow rumen microbiomes

Determining compositions and functional capabilities of complex populations is often challenging, especially for sequencing technologies with short reads that do not uniquely identify organisms or genes. Long-read sequencing improves the resolution of these mixed communities, but adoption for this application has been limited due to concerns about throughput, cost and accuracy. The recently introduced PacBio Sequel System generates hundreds of thousands of long and highly accurate single-molecule reads per SMRT Cell. We investigated how the Sequel System might increase understanding of metagenomic communities. In the past, focus was largely on taxonomic classification with 16S rRNA sequencing. Recent expansion to WGS sequencing enables functional profiling as well, with the ultimate goal of complete genome assemblies. Here we compare the complex microbiomes in 5 cow rumen samples, for which Illumina WGS sequence data was also available. To maximize the PacBio single-molecule sequence accuracy, libraries of 2 to 3 kb were generated, allowing many polymerase passes per molecule. The resulting reads were filtered at predicted single-molecule accuracy levels up to 99.99%. Community compositions of the 5 samples were compared with Illumina WGS assemblies from the same set of samples, indicating rare organisms were often missed with Illumina. Assembly from PacBio CCS reads yielded a contig >100 kb in length with 6-fold coverage. Mapping of Illumina reads to the 101 kb contig verified the PacBio assembly and contig sequence. These results illustrate ways in which long accurate reads benefit analysis of complex communities.


June 1, 2021  |  

Profiling complex communities with highly accurate single molecule reads: cow rumen microbiomes

Determining compositions and functional capabilities of complex populations is often challenging, especially for sequencing technologies with short reads that do not uniquely identify organisms or genes. Long-read sequencing improves the resolution of these mixed communities, but adoption for this application has been limited due to concerns about throughput, cost and accuracy. The recently introduced PacBio Sequel System generates hundreds of thousands of long and highly accurate single-molecule reads per SMRT Cell. We investigated how the Sequel System might increase understanding of metagenomic communities. In the past, focus was largely on taxonomic classification with 16S rRNA sequencing. Recent expansion to WGS sequencing enables functional profiling as well, with the ultimate goal of complete genome assemblies. Here we compare the complex microbiomes in 5 cow rumen samples, for which Illumina WGS sequence data was also available. To maximize the PacBio single-molecule sequence accuracy, libraries of 2 to 3 kb were generated, allowing many polymerase passes per molecule. The resulting reads were filtered at predicted single-molecule accuracy levels up to 99.99%. Community compositions of the 5 samples were compared with Illumina WGS assemblies from the same set of samples, indicating rare organisms were often missed with Illumina. Assembly from PacBio CCS reads yielded a contig >100 kb in length with 6-fold coverage. Mapping of Illumina reads to the 101 kb contig verified the PacBio assembly and contig sequence. Scaffolding with reads from a PacBio unsheared library produced a complete genome of 2.4 Mb. These results illustrate ways in which long accurate reads benefit analysis of complex communities.


June 1, 2021  |  

Using the PacBio Sequel System to taxonomically and functionally classify metagenomic samples in a trial of patients undergoing fecal microbiota transplantation

Whole-sample shotgun sequencing can provide a more detailed view of a metagenomic community than 16S sequencing, but its use in multi-sample experiments is limited by throughput, cost and analysis complexity. While short-read sequencing technologies offer higher throughput, read lengthss less fewer than 500 bp will rarely cover a gene of interest, and necessitate assembly before further analysis. Assembling large fragments requires sampling each community member at a high depth, significantly increasing the amount of sequencing needed, and limiting the analysis of rare community members. Assembly methods also risk It is also possible to incorrectly combine combining sequences from different community members.


June 1, 2021  |  

SMRT-Cappable-seq reveals the complex operome of bacteria

SMRT-Cappable-seq combines the isolation of full-length prokaryotic primary transcripts with long read sequencing technology. It is the first experimental methodology to sequence entire prokaryotic transcripts. It identifies the transcription start site and termination site, thereby directly defines the operon structures genome-wide in prokaryotes. Applied to E.coli, SMRT-Cappable-seq identifies a total of ~2300 operons, among which ~900 are novel. Importantly, our result reveals a pervasive read-through of previous experimentally validated transcription termination sites. Termination read-through represents a powerful strategy to control gene expression. Taken together this data provides a first glance at the complexity of the ‘operome’ in bacteria and presents an invaluable resource for understanding gene regulation and function in bacteria.


June 1, 2021  |  

Full-length cDNA sequencing of prokaryotic transcriptome and metatranscriptome samples

Next-generation sequencing has become a useful tool for studying transcriptomes. However, these methods typically rely on sequencing short fragments of cDNA, then attempting to assemble the pieces into full-length transcripts. Here, we describe a method that uses PacBio long reads to sequence full-length cDNAs from individual transcriptomes and metatranscriptome samples. We have adapted the PacBio Iso-Seq protocol for use with prokaryotic samples by incorporating RNA polyadenylation and rRNA-depletion steps. In conjunction with SMRT Sequencing, which has average readlengths of 10-15 kb, we are able to sequence entire transcripts, including polycistronic RNAs, in a single read. Here, we show full-length bacterial transcriptomes with the ability to visualize transcription of operons. In the area of metatranscriptomics, long reads reveal unambiguous gene sequences without the need for post-sequencing transcript assembly. We also show full-length bacterial transcripts sequenced after being treated with NEB’s Cappable-Seq, which is an alternative method for depleting rRNA and enriching for full-length transcripts with intact 5’ ends. Combining Cappable-Seq with PacBio long reads allows for the detection of transcription start sites, with the additional benefit of sequencing entire transcripts.


June 1, 2021  |  

A complete solution for full-length transcript sequencing using the PacBio Sequel II System

Long read mRNA sequencing methods such as PacBio’s Iso-Seq method offers high-throughput transcriptome profiling in prokaryotic and eukaryotic cells. By avoiding the transcript assembly problem and instead sequencing full-length cDNA, Iso-Seq has emerged as the most reliable technology for annotating isoforms and, in turn, improving proteome predictions in a wide variety of organisms. Improvements in library preparation, sequencing throughput, and bioinformatics has enabled the Iso-Seq method to be complete solution for transcript characterization. The Iso-Seq Express kit is a one-day library prep requiring 60-300 ng of total RNA. The PacBio Sequel II system produces 4-5 million full-length reads, sufficient to profile a whole human transcriptome. Finally, the SQANTI2 software is a powerful tool for categorizing the complex isoforms against reference annotations, while also incorporating orthogonal information such as CAGE peak data, public RNA-seq junction data, and ORF predictions.


June 1, 2021  |  

Comparative metagenome-assembled genome analysis of “Candidatus Lachnocurva vaginae”, formerly known as Bacterial Vaginosis Associated bacterium – 1 (BVAB1)

Bacterial Vaginosis Associated bacterium 1 (BVAB1) is an as-yet uncultured bacterial species found in the human vagina that belongs to the family Lachnospiraceae within the order Clostridiales. As its name suggests, this bacterium is often associated with bacterial vaginosis (BV), a common vaginal disorder that has been shown to increase a woman’s risk for HIV, Chlamydia trachomatis, and Neisseria gonorrhoeae infections as well as preterm birth. Further, BVAB1 is associated with the persistence of BV following metronidazole treatment, increased vaginal inflammation, and adverse obstetrics outcomes. There is no available complete genome sequence of BVAB1, which has made it di?cult to mechanistically understand its role in disease. We present here a circularized metagenome-assembled genome (cMAG) of B VAB1 as well as a comparative analysis including an additional six metagenome-assembled genomes (MAGs) of this species. These sequences were derived from cervicovaginal samples of seven separate women. The cMAG is 1.649 Mb in size and encodes 1,578 genes. We propose to rename BVAB1 to “Candidatus Lachnocurva vaginae” based on phylogenetic analyses, and provide genomic evidence that this candidate species may metabolize D-lactate, produce trimethylamine (one of the chemicals responsible for BV-associated odor), and be motile. The cMAG and the six MAGs are valuable resources that will further contribute to our understanding of the heterogeneous etiology of bacterial vaginosis.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.