June 1, 2021  |  

Simplified sequencing of full-length isoforms in cancer on the PacBio Sequel platform

Tremendous flexibility is maintained in the human proteome via alternative splicing, and cancer genomes often subvert this flexibility to promote survival. Identification and annotation of cancer-specific mRNA isoforms is critical to understanding how mutations in the genome affect the biology of cancer cells. While microarrays and other NGS-based methods have become useful for studying transcriptomes, these technologies yield short, fragmented transcripts that remain a challenge for accurate, complete reconstruction of splice variants. In cancer proteomics studies, the identification of biomarkers from mass spectroscopy data is often limited by incomplete gene isoform expression information to support protein to transcript mapping. The Iso-Seq protocol developed at PacBio offers the only solution for direct sequencing of full-length, single-molecule cDNA sequences needed to discover biomarkers for early detection and cancer stratification, to fully characterize gene fusion events, and to elucidate drug resistance mechanisms. Knowledge of the complete isoform repertoire is also key for accurate quantification of isoform abundance. As most transcripts range from 1 – 10 kb, fully intact RNA molecules can be sequenced using SMRT® Sequencing without requiring fragmentation or post-sequencing assembly. However, some cancer research applications have presented a challenge for the Iso-Seq protocol, due to the combination of limited sample input and the need to deeply sequence heterogenous samples. Here we report the optimization of the Iso-Seq library preparation protocol for the PacBio Sequel platform and its application to cancer cell lines and tumor samples. We demonstrate how loading enhancements on the higher-throughput Sequel instrument have decreased the need for size fractionation steps, reducing sample input requirements while simultaneously simplifying the sample preparation workflow and increasing the number of full-length transcripts per SMRT Cell.


June 1, 2021  |  

Multiplexing strategies for microbial whole genome sequencing using the Sequel System

For microbial sequencing on the PacBio Sequel System, the current yield per SMRT Cell is in excess relative to project requirements. Multiplexing offers a viable solution; greatly increasing throughput, efficiency, and reducing costs per genome. This approach is achieved by incorporating a unique barcode for each microbial sample into the SMRTbell adapters and using a streamlined library preparation process. To demonstrate performance,12 unique barcodes assigned to B. subtilis and sequenced on a single SMRT Cell. To further demonstrate the applicability of this method, we multiplexed the genomes of 16 strains of H. pylori. Each DNA was sheared to 10 kb, end-repaired and ligated with a barcoded adapter in a single-tube reaction. The barcoded samples were pooled in equimolar quantities and a single SMRTbell library was prepared. Successful de novo microbial assemblies were achieved from all multiplexes tested (12-, and 16-plex) using data generated from a single SMRTbell library, run on a single SMRT Cell 1M with the PacBio Sequel System, and analyzed with standard SMRT Analysis assembly methods. Here, we describe a protocol that facilitated the multiplexing up to 12-plex of microbial genomes in one SMRT Cell 1M on the Sequel System that produced near-complete microbial de novo assemblies of <10 contigs for genomes <5 Mb in size.


June 1, 2021  |  

From RNA to full-length transcripts: The PacBio Iso-Seq method for transcriptome analysis and genome annotation

A single gene may encode a surprising number of proteins, each with a distinct biological function. This is especially true in complex eukaryotes. Short- read RNA sequencing (RNA-seq) works by physically shearing transcript isoforms into smaller pieces and bioinformatically reassembling them, leaving opportunity for misassembly or incomplete capture of the full diversity of isoforms from genes of interest. The PacBio Isoform Sequencing (Iso-Seq™) method employs long reads to sequence transcript isoforms from the 5’ end to their poly-A tails, eliminating the need for transcript reconstruction and inference. These long reads result in complete, unambiguous information about alternatively spliced exons, transcriptional start sites, and poly- adenylation sites. This allows for the characterization of the full complement of isoforms within targeted genes, or across an entire transcriptome. Here we present improved genome annotations for two avian models of vocal learning, Anna’s hummingbird (Calypte anna) and zebra finch (Taeniopygia guttata), using the Iso-Seq method. We present graphical user interface and command line analysis workflows for the data sets. From brain total RNA, we characterize more than 15,000 isoforms in each species, 9% and 5% of which were previously unannotated in hummingbird and zebra finch, respectively. We highlight one example where capturing full-length transcripts identifies additional exons and UTRs.


June 1, 2021  |  

Targeted enrichment without amplification and SMRT Sequencing of repeat-expansion disease causative genomic regions

Targeted sequencing has proven to be an economical means of obtaining sequence information for one or more defined regions of a larger genome. However, most target enrichment methods are reliant upon some form of amplification. Amplification removes the epigenetic marks present in native DNA, and some genomic regions, such as those with extreme GC content and repetitive sequences, are recalcitrant to faithful amplification. Yet, a large number of genetic disorders are caused by expansions of repeat sequences. Furthermore, for some disorders, methylation status has been shown to be a key factor in the mechanism of disease. We have developed a novel, amplification-free enrichment technique that employs the CRISPR/Cas9 system for specific targeting of individual human genes. This method, in conjunction with SMRT Sequencing’s long reads, high consensus accuracy, and uniform coverage, allows the sequencing of complex genomic regions that cannot be investigated with other technologies. Using human genomic DNA samples and this strategy, we have successfully targeted the loci of a number of repeat expansion disorders (HTT, FMR1, ATXN10, C9orf72). With this data, we demonstrate the ability to isolate hundreds of individual on-target molecules and accurately sequence through long repeat stretches, regardless of the extreme GC-content, followed by accurate sequencing on a single PacBio RS II SMRT Cell or Sequel SMRT Cell 1M. The method is compatible with multiplexing of multiple targets and multiple samples in a single reaction. Furthermore, this technique also preserves native DNA molecules for sequencing, allowing for the possibility of direct detection and characterization of epigenetic signatures. We demonstrate detection of 5-mC in human promoter sequences and CpG islands.


June 1, 2021  |  

Targeted sequencing using a long-read sequencing technology

Targeted sequencing employing PCR amplification is a fundamental approach to studying human genetic disease. PacBio’s Sequel System and supporting products provide an end-to-end solution for amplicon sequencing, offering better performance to Sanger technology in accuracy, read length, throughput, and breadth of informative data. Sample multiplexing is supported with three barcoding options providing the flexibility to incorporate unique sample identifiers during target amplification or library preparation. Multiplexing is key to realizing the full capacity of the 1 million individual reactions per Sequel SMRT Cell. Two analysis workflows that can generate high-accuracy results support a wide range of amplicon sizes in two ranges from 250 bp to 3 kb and from 3 kb to >10 kb. The Circular Consensus Sequencing workflow results in high accuracy through intra-molecular consensus generation, while high accuracy for the Long Amplicon Analysis workflow is achieved by clustering of individual long reads from multiple reactions. Here we present workflows and results for single- molecule sequencing of amplicons for human genetic analysis.


June 1, 2021  |  

Multiplexed complete microbial genomes on the Sequel System

Microbes play an important role in nearly every part of our world, as they affect human health, our environment, agriculture, and aid in waste management. Complete closed genome sequences, which have become the gold standard with PacBio long-read sequencing, can be key to understanding microbial functional characteristics. However, input requirements, consumables costs, and the labor required to prepare and sequence a microbial genome have in the past put PacBio sequencing out of reach for some larger projects. We have developed a multiplexed library prep approach that is simple, fast, and cost-effective, and can produce 4 to 16 closed bacterial genomes from one Sequel SMRT Cell. Additionally, we are introducing a streamlined analysis pipeline for processing multiplexed genome sequence data through de novo HGAP assembly, making the entire process easy for lab personnel to perform. Here we present the entire workflow from shearing through assembly, with times for each step. We show HGAP assembly results with single or very few contigs from bacteria from different size genomes, sequenced without or with size selection. These data illustrate the benefits and potential of the PacBio multiplexed library prep and the Sequel System for sequencing large numbers of microbial genomes.


June 1, 2021  |  

Full-length transcript profiling with the Iso-Seq method for improved genome annotations

Incomplete annotation of genomes represents a major impediment to understanding biological processes, functional differences between species, and evolutionary mechanisms. Often, genes that are large, embedded within duplicated genomic regions, or associated with repeats are difficult to study by short-read expression profiling and assembly. In addition, most genes in eukaryotic organisms produce alternatively spliced isoforms, broadening the diversity of proteins encoded by the genome, which are difficult to resolve with short-read methods. Short-read RNA sequencing (RNA-seq) works by physically shearing transcript isoforms into smaller pieces and bioinformatically reassembling them, leaving opportunity for misassembly or incomplete capture of the full diversity of isoforms from genes of interest. In contrast, Single Molecule, Real-Time (SMRT) Sequencing directly sequences full-length transcripts without the need for assembly and imputation. Here we apply the Iso-Seq method (long-read RNA sequencing) to detect full-length isoforms and the new IsoPhase algorithm to retrieve allele-specific isoform information for two avian models of vocal learning, Anna’s hummingbird (Calypte anna) and zebra finch (Taeniopygia guttata).


June 1, 2021  |  

High-throughput SMRT Sequencing of clinically relevant targets

Targeted sequencing with Sanger as well as short read based high throughput sequencing methods is standard practice in clinical genetic testing. However, many applications beyond SNP detection have remained somewhat obstructed due to technological challenges. With the advent of long reads and high consensus accuracy, SMRT Sequencing overcomes many of the technical hurdles faced by Sanger and NGS approaches, opening a broad range of untapped clinical sequencing opportunities. Flexible multiplexing options, highly adaptable sample preparation method and newly improved two well-developed analysis methods that generate highly-accurate sequencing results, make SMRT Sequencing an adept method for clinical grade targeted sequencing. The Circular Consensus Sequencing (CCS) analysis pipeline produces QV 30 data from each single intra-molecular multi-pass polymerase read, making it a reliable solution for detecting minor variant alleles with frequencies as low as 1 %. Long Amplicon Analysis (LAA) makes use of insert spanning full-length subreads originating from multiple individual copies of the target to generate highly accurate and phased consensus sequences (>QV50), offering a unique advantage for imputation free allele segregation and haplotype phasing. Here we present workflows and results for a range of SMRT Sequencing clinical applications. Specifically, we illustrate how the flexible multiplexing options, simple sample preparation methods and new developments in data analysis tools offered by PacBio in support of Sequel System 5.1 can come together in a variety of experimental designs to enable applications as diverse as high throughput HLA typing, mitochondrial DNA sequencing and viral vector integrity profiling of recombinant adeno-associated viral genomes (rAAV).


June 1, 2021  |  

Single chromosomal genome assemblies on the Sequel System with Circulomics high molecular weight DNA extraction for microbes

Background: The Nanobind technology from Circulomics provides an elegant HMW DNA extraction solution for genome sequencing of Gram-positive and -negative microbes. Nanobind is a nanostructured magnetic disk that can be used for rapid extraction of high molecular weight (HMW) DNA from diverse sample types including cultured cells, blood, plant nuclei, and bacteria. Processing can be completed in <1 hour for most sample types and can be performed manually or automated with common instruments. Methods:We have validated several critical steps for generating high-quality microbial genome assemblies in a streamlined microbial multiplexing workflow. This new workflow enables high-volume, cost-effective sequencing of up to 16 microbes totaling 30 Mb in genome size on a single SMRT Cell 1M using a target shear size of 10 kb. We also evaluated this method on a pool of four “class 3” microbes that contain >7 kb repeats. Fragment size was increased to ~14 kb, with some fragments >30 kb. Results: Here we present a demonstration of these capabilities using isolates relevant to high-throughput sequencing applications, including common foodborne pathogens (Shigella, Listeria, Salmonella), and species often seen in hospital settings (Klebsiella, Staphylococcus). For nearly all microbes, including difficult-to-assemble class III microbes, we achieved complete de novo microbial assemblies of =5 chromosomal contigs with minimum quality scores of 40 (99.99% accuracy) using data from multiplexed SMRTbell libraries. Each library was sequenced on a single SMRT Cell 1M with the PacBio Sequel System and analyzed with streamlined SMRT Analysis assembly methods. Conclusions: We achieved high-quality, closed microbial genomes using a combination of Circulomics Nanobind extraction and PacBio SMRT Sequencing, along with a newly streamlined workflow that includes automated demultiplexing and push-button assembly.


June 1, 2021  |  

A simple segue from Sanger to high-throughput SMRT Sequencing with a M13 barcoding system

High-throughput NGS methods are increasingly utilized in the clinical genomics market. However, short-read sequencing data continues to remain challenged by mapping inaccuracies in low complexity regions or regions of high homology and may not provide adequate coverage within GC-rich regions of the genome. Thus, the use of Sanger sequencing remains popular in many clinical sequencing labs as the gold standard approach for orthogonal validation of variants and to interrogate missed regions poorly covered by second-generation sequencing. The use of Sanger sequencing can be less than ideal, as it can be costly for high volume assays and projects. Additionally, Sanger sequencing generates read lengths shorter than the region of interest, which limits its ability to accurately phase allelic variants. High-throughput SMRT Sequencing overcomes the challenges of both the first- and second-generation sequencing methods. PacBio’s long read capability allows sequencing of full-length amplicons


June 1, 2021  |  

No-amp targeted SMRT sequencing using a CRISPR-Cas9 enrichment method

Targeted sequencing of genomic DNA requires an enrichment method to generate detectable amounts of sequencing products. Genomic regions with extreme composition bias and repetitive sequences can pose a significant enrichment challenge. Many genetic diseases caused by repeat element expansions are representative of these challenging enrichment targets. PCR amplification, used either alone or in combination with a hybridization capture method, is a common approach for target enrichment. While PCR amplification can be used successfully with genomic regions of moderate to high complexity, it is the low-complexity regions and regions containing repetitive elements sometimes of indeterminate lengths due to repeat expansions that can lead to poor or failed PCR enrichment. We have developed an enrichment method for targeted SMRT Sequencing on the PacBio Sequel System using the CRISPR-Cas9 system that requires no PCR amplification. Briefly, a preformed SMRTbell library containing the target region of interest is cleaved with Cas9 through direct interaction with a sequence-specific guide RNA. After ligation with new poly(A) hairpin adapters, the asymmetric SMRTbell templates are enriched by magnetic bead separation. This method, paired with SMRT Sequencing’s long reads, high consensus accuracy, and uniform coverage, allows sequencing of genomic regions regardless of challenging sequence context that cannot be investigated with other technologies. The method is amenable to analyzing multiple samples and/or targets in a single reaction. In addition, this method also preserves epigenetic modifications allowing for the detection and characterization of DNA methylation which has been shown to be a key factor in the disease mechanism for some repeat expansion diseases. Here we present results of our latest No-Amp Targeted Sequencing procedure applied to the characterization of CAG triplet repeat expansions in the HTT gene responsible for Huntington’s Disease.


June 1, 2021  |  

Unbiased characterization of metagenome composition and function using HiFi sequencing on the PacBio Sequel II System

Recent work comparing metagenomic sequencing methods indicates that a comprehensive picture of the taxonomic and functional diversity of complex communities will be difficult to achieve with short-read technology alone. While the lower cost of short reads has enabled greater sequencing depth, the greater contiguity of long-read assemblies and lack of GC bias in SMRT Sequencing has enabled better gene finding. However, since long-read assembly requires high coverage for error correction, the benefits of unbiased coverage have in the past been lost for low abundance species. SMRT Sequencing performance improvements and the introduction of the Sequel II System has enabled a new, high throughput data type uniquely suited to metagenome characterization: HiFi reads. HiFi reads combine high accuracy with read lengths up to 15 kb, eliminating the need for assembly for most microbiome applications, including functional profiling, gene discovery, and metabolic pathway reconstruction. Here we present the application of the HiFi data type to enable a new method of analyzing metagenomes that does not require assembly.


June 1, 2021  |  

TLA & long-read sequencing: Efficient targeted sequencing and phasing of the CFTR gene

Background: The sequencing and haplotype phasing of entire gene sequences improves the understanding of the genetic basis of disease and drug response. One example is cystic fibrosis (CF). Cystic fibrosis transmembrane conductance regulator (CFTR) modulator therapies have revolutionized CF treatment, but only in a minority of CF subjects. Observed heterogeneity in CFTR modulator efficacy is related to the range of CFTR mutations; revertant mutations can modify the response to CFTR modulators, and other intronic variations in the ~200 kb CFTR gene have been linked to disease severity. Heterogeneity in the CFTR gene may also be linked to differential responses to CFTR modulators. The Targeted Locus Amplification (TLA) technology from Cergentis can be used to selectively amplify, sequence and phase the entire CFTR gene. With PacBio long-read SMRT Sequencing, TLA amplicons are sequenced intact and long-range phasing information of all fragments in entire amplicons is retrieved. Experimental Design and Methods: The TLA process produces amplicons consisting of 5-10 proximity ligated DNA fragments. TLA was performed on cell line and genomic DNA from Coriell GM12878, which has few heterozygous SNVs in CFTR, and the IB3 cell line, with known haplotypes but heterozygous for the delta508 mutation. All sample types were prepared with high and low density TLA primer sets, targeting coverage of >100 kb of the CFTR gene. Conclusion: We have demonstrated the power and utility of TLA with long-read SMRT Sequencing as a valuable research tool in sequencing and phasing across very long regions of the human genome. This process can be done in an efficient manner, multiplexing multiple genes and samples per SMRT Cell in a process amenable to high-throughput sequencing.


June 1, 2021  |  

New advances in SMRT Sequencing facilitate multiplexing for de novo and structural variant studies

The latest advancements in Sequel II SMRT Sequencing have increased average read lengths up to 50% compared to Sequel II chemistry 1.0 which allows multiplexing of 2-3 small organisms (<500 Mb) such as insects and worms for producing reference quality assemblies, calling structural variants for up to 2 samples with ~3 Gb genomes, analysis of 48 microbial genomes, and up to 8 communities for metagenomic profiling in a single SMRT Cell 8M. With the improved processivity of the new Sequel II sequencing polymerase, more SMRTbell molecules reach rolling circle mode resulting in longer overall read lengths, thus allowing efficient detection of barcodes (up to 80%) in the SMRTbell templates. Multiplexing of genomes larger than microbial organisms is now achievable. In collaboration with the Wellcome Sanger Institute, we have developed a workflow for multiplexing two individual Anopheles coluzzii using as low as 150 ng genomic DNA per individual. The resulting assemblies had high contiguity (contig N50s over 3 Mb) and completeness (>98% of conserved genes) for both individuals. For microbial multiplexing, we multiplexed 48 microbes with varying complexities and sizes ranging 1.6-8.0 Mb in single SMRT Cell 8M. Using a new end-to-end analysis (Microbial Assembly Analysis, SMRT Link 8.0), assemblies resulted in complete circularized genomes (>200-fold coverage) and efficient detection of >3-200 kb plasmids. Finally, the long read lengths (>90 kb) allows detection of barcodes in large insert SMRTbell templates (>15 kb) thus facilitating multiplex of two human samples in 1 SMRT Cell 8M for detecting SVs, Indels and CNVs. Here, we present results and describe workflows for multiplexing samples for specific applications for SMRT Sequencing.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.