June 1, 2021  |  

Long-read, single-molecule applications for protein engineering.

The long read lengths of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases of sequence. This feature is particularly useful in the context of protein engineering, where large numbers of similar constructs are generated routinely to explore the effects of mutations on function and stability. We have developed a PCR-based barcoded sequencing method to generate high quality, full-length sequence data for batches of constructs generated in a common backbone. Individual barcodes are coupled to primers targeting a common region of the vector of interest. The amplified products are pooled into a single DNA library, and sequencing data are clustered by barcode to generate multi-molecule consensus sequences for each construct present in the pool. As a proof-of-concept dataset, we have generated a library of 384 randomly mutated variants of the Phi29 DNA polymerase, a 575 amino acid protein encoded by a 1.7 kb gene. These variants were amplified with a set of barcoded primers, and the resulting library was sequenced on a single SMRT Cell. The data produced sequences that were completely concordant with independent Sanger sequencing, for a 100% accurate reconstruction of the set of clones.


June 1, 2021  |  

SMRT Sequencing solutions for investigative studies to understand evolutionary processes.

Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers to understand molecular mechanisms in evolution and gain insight into adaptive strategies. With read lengths exceeding 10 kb, we are able to sequence high-quality, closed microbial genomes with associated plasmids, and investigate large genome complexities, such as long, highly repetitive, low-complexity regions and multiple tandem-duplication events. Improved genome quality, observed at 99.9999% (QV60) consensus accuracy, and significant reduction of gap regions in reference genomes (up to and beyond 50%) allow researchers to better understand coding sequences with high confidence, investigate potential regulatory mechanisms in noncoding regions, and make inferences about evolutionary strategies that are otherwise missed by the coverage biases associated with short- read sequencing technologies. Additional benefits afforded by SMRT Sequencing include the simultaneous capability to detect epigenomic modifications and obtain full-length cDNA transcripts that obsolete the need for assembly. With direct sequencing of DNA in real-time, this has resulted in the identification of numerous base modifications and motifs, which genome-wide profiles have linked to specific methyltransferase activities. Our new offering, the Iso-Seq Application, allows for the accurate differentiation between transcript isoforms that are difficult to resolve with short-read technologies. PacBio reads easily span transcripts such that both 5’/3’ primers for cDNA library generation and the poly-A tail are observed. As such, exon configuration and intron retention events can be analyzed without ambiguity. This technological advance is useful for characterizing transcript diversity and improving gene structure annotations in reference genomes. We review solutions available with SMRT Sequencing, from targeted sequencing efforts to obtaining reference genomes (>100 Mb). This includes strategies for identifying microsatellites and conducting phylogenetic comparisons with targeted gene families. We highlight how to best leverage our long reads that have exceeded 20 kb in length for research investigations, as well as currently available bioinformatics strategies for analysis. Benefits for these applications are further realized with consistent use of size selection of input sample using the BluePippin™ device from Sage Science as demonstrated in our genome improvement projects. Using the latest P5-C3 chemistry on model organisms, these efforts have yielded an observed contig N50 of ~6 Mb, with the longest contig exceeding 12.5 Mb and an average base quality of QV50.


June 1, 2021  |  

Barcoding strategies for multiplexing of samples using a long-read sequencing technology.

We have developed barcoding reagents and workflows for multiplexing amplicons or fragmented native genomic (DNA) prior to Single Molecule, Real-Time (SMRT) Sequencing. The long reads of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases (kb) of sequence. This feature is particularly useful in the context of mutational analysis or SNP confirmation, where a large number of samples are generated routinely. To validate this workflow, a set of 384 1.7-kb amplicons, each derived from variants of the Phi29 DNA polymerase gene, were barcoded during amplification, pooled, and sequenced on a single SMRT Cell. To demonstrate the applicability of the method to longer inserts, a library of 96 5-kb clones derived from the E. coli genome was sequenced.


June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing.

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments generally use short-read, second-generation sequencing, which results in data processing difficulties. For example, reads less than 1 kb in length will likely not cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT) Sequencing reads in the 1-2 kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells would generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community, since SMRT Sequencing has been shown to have no sequence-context bias. Long read lengths mean that that it would be reasonable to expect a high number of the reads to include gene fragments useful for analysis.


June 1, 2021  |  

Full-length sequencing of HLA class I genes of more than 1000 samples provides deep insights into sequence variability

Aim: The vast majority of donor typing relies on sequencing exons 2 and 3 of HLA class I genes (HLA-A, -B, -C). With such an approach certain allele combinations do not result in the anticipated “high resolution” (G-code) typing, due to the lack of exon-phasing information. To resolve ambiguous typing results for a haplotype frequency project, we established a whole gene sequencing approach for HLA class I, facilitating also an estimation of the degree of sequence variability outside the commonly sequenced exons. Methods: Primers were developed flanking the UTR regions resulting in similar amplicon lengths of 4.2-4.4 kb. Using a 4-primer approach, secondary primers containing barcodes were combined with the gene specific primers to obtain barcoded full-gene amplicons in a single amplification step. Amplicons were pooled, purified, and ligated to SMRT bells (i.e. annealing points for sequencing primers) following standard protocols from Pacific Biosciences. Taking advantage of the SMRT chemistry, pools of 48-72 amplicons were sequenced full length and phased in single runs on a Pacific Biosciences RSII instrument. Demultiplexing was achieved using the SMRT portal. Sequence analysis was performed using NGSengine software (GenDx). Results: We successfully performed full-length gene sequencing of 1003 samples, harboring ambiguous typings of either HLA-A (n=46), HLA-B (n=304) or HLA-C (n=653). Despite the high per-read raw error rates typical for SMRT sequencing (~15%) the consensus sequence proved highly reliable. All consensus sequences for exons 2 and 3 were in full accordance with their MiSeq-derived sequences. Unambiguous allelic resolution was achieved for all samples. We observed novel intronic, exonic as well as UTR sequence variations for many of the alleles covered by our data set. This included sequences of 600 individuals with HLA-C*07:01/C*07:02 genotype revealing the extent of sequence variation outside the exons 2 and 3. Conclusion: Here we present a whole gene amplification and sequencing approach for HLA class I genes. The maturity of this approach was demonstrated by sequencing more than 1000 samples, achieving fully phased allelic sequences. Extensive sequencing of one common allele combination hints at the yet to discover diversity of the HLA system outside the commonly analyzed exons.


June 1, 2021  |  

Phased full-length SMRT Sequencing of HLA DPB1

Aim: In contrast to exon-based HLA-typing approaches, whole gene genotyping crucially depends on full-length sequences submitted to the IMGT/HLA Database. Currently, full-length sequences are provided for only 7 out of 520 HLA-DPB1 alleles. Therefore, we developed a fully phased whole-gene sequencing approach for DPB1, to facilitate further exploration of the allelic structure at this locus. Methods: Primers were developed flanking the UTR-regions of DPB1 resulting in a 12 kb amplicon. Using a 4-primer approach, secondary primers containing barcodes were combined with the gene-specific primers to obtain barcoded full-gene amplicons in a single amplification step. Amplicons were pooled, purified, and ligated to SMRT bells (i.e. annealing points for sequencing primers) following standard protocols from Pacific Biosciences. Taking advantage of the SMRT chemistry, pools of 48 amplicons were sequenced full length in single runs on a Pacific Biosciences RSII instrument. Demultiplexing was performed using the SMRT portal. Sequence analysis was performed using the NGSengine software (GenDx). Results: We analyzed a set of 48 randomly picked samples. With 3 exceptions due to PCR failure, all genotype assignments conformed to standard genotyping results based on exons 2 and 3. Allelic proportions for heterozygous positions were evenly distributed (range 0.4 – 0.6) for all samples, suggesting unbiased amplifications. Despite the high per-read raw error rates typical for SMRT sequencing (~15%) the consensus sequence proved highly reliable. All consensus sequences for exons 2 and 3 were in full accordance with their MiSeq-derived sequences. We describe novel intronic sequence variation of the 7 so far genomically defined alleles, as well as 7 whole-length DPB1 alleles with hitherto unknown intronic regions. One of these alleles (HLA-DPB1*131:01) is classified as rare. Conclusion: Here we present a whole gene amplification and sequencing workflow for DPB1 alleles utilizing single molecule real-time (SMRT) sequencing from Pacific Biosciences. Validation of consensus sequences against known exonic sequences highlights the reliability of this technology. This workflow will facilitate amending the IMGT/HLA Database for DPB1.


June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities, spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments require a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-2 kb range, with >99% consensus accuracy, can be efficiently generated for low amounts of input DNA, e.g. as little as 10 ng of input DNA sequenced in 4 SMRT Cells can generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community. Long read lengths translate to a high number of the reads harboring full genes or even full operons for downstream analysis. Here we present the results of circular-consensus sequencing on a mock metagenomic community with an abundance range of multiple orders of magnitude, and compare the results with both 16S and shotgun assembly methods. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows to elucidate meaningful information from the very low-abundance community members. For example, given the above low-input sequencing approach, a community member at 1/1,000 relative abundance would generate 100 1-2 kb sequence fragments having 99% consensus accuracy, with a high probability of containing a gene fragment useful for taxonomic classification or functional insight.


June 1, 2021  |  

Profiling the microbiome in fecal microbiota transplantation using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT®) Sequencing reads in the 1-3kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells on the PacBio RS II would generate >100,000 such reads. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With reads >1 kb at >99% accuracy it is reasonable to expect a high percentage of reads include gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


June 1, 2021  |  

Workflow for processing high-throughput, Single Molecule, Real-Time Sequencing data for analyzing the microbiome of patients undergoing fecal microbiota transplantation

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500 bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-3 kb range, with >99% accuracy can be generated using the previous generation PacBio RS II or, in much higher throughput, using the new Sequel System. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With single-molecule reads >1 kb at >99% consensus accuracy, it is reasonable to expect a high percentage of reads to include genes or gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


June 1, 2021  |  

“SMRTer Confirmation”: Scalable clinical read-through variant confirmation using the Pacific Biosciences SMRT Sequencing platform

Next-generation sequencing (NGS) has significantly improved the cost and turnaround time for diagnostic genetic tests. ACMG recommends variant confirmation by an orthogonal method, unless sufficiently high sensitivity and specificity can be demonstrated using NGS alone. Most NGS laboratories make extensive use of Sanger sequencing for secondary confirmation of single nucleotide variants (SNVs) and indels, representing a large fraction of the cost and time required to deliver high quality genetic testing data to clinicians and patients. Despite its established data quality, Sanger is not a high-throughput method by today’s standards from either an assay or analysis standpoint as it can involve manual review of Sanger traces and is not amenable to multiplexing. Toward a scalable solution for confirmation, Invitae has developed a fully automated and LIMS-tracked assay and informatics pipeline that utilizes the Pacific Biosciences SMRT sequencing platform. Invitae’s pipeline generates PCR amplicons that encompass the variant(s) of interest, which are converted to closed DNA structures (SMRTbells) and sequenced in pools of 96 per SMRTcell. Each amplicon is appended with a 16nt barcode that encodes the patient and variant IDs. Per-sample de-multiplexing, alignment, variant calling, and confirmation resolution are handled via an automated pipeline. The confirmation process was validated by analyzing 243 clinical SNVs and indels in parallel with the gold standard Sanger sequencing method. Amplicons were sequenced and analyzed in technical replicates to demonstrate reproducibility. In this study, the PacBio-based confirmation pipeline demonstrated high reproducibility (97.5%), and outperformed Sanger in the fraction of primary NGS variants confirmed (PacBio = 93.4% and 94.7% confirmed across two replicates, Sanger = 84.8%) while having 100% concordance of confirmation status among overlapping confirmation calls.


June 1, 2021  |  

Microbiome profiling at the strain level using rRNA amplicons

Strain level microbiome profiling is needed for a full understanding of how microbial communities influence human health. Microbiome profiling of rRNA gene amplicons is a well-understood method that is rapid and inexpensive, but standard 16S rRNA gene methods generally cannot differentiate closely related strains. Whole genome/shotgun microbiome profiling is considered a higher-resolution alternative, but with decreased throughput and significantly increased sequencing costs and analysis burden. With both methods there are also challenges with microbial lysis, DNA preparation, and taxonomic analysis. Specialized microbiome-focused protocols were developed to achieve strain-level taxonomic differentiation using a rapid, high throughput rRNA gene assay. The protocol integrates lysis and DNA preparation improvements with a unique high information content amplicon and associated novel database to enable taxonomic differentiation of closely related microbial strains.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.