Learn how Single Molecule, Real-Time (SMRT) Sequencing and the Sequel IIe System and will accelerate your research by delivering highly accurate long reads to provide the most comprehensive view of genomes, transcriptomes and epigenomes.
The study of genomics has revolutionized our understanding of science, but the field of transcriptomics grew with the need to explore the functional impacts of genetic variation. While different tissues in an organism may share the same genomic DNA, they can differ greatly in what regions are transcribed into RNA and in their patterns of RNA processing. By reviewing the history of transcriptomics, we can see the advantages of RNA sequencing using a full-length transcript approach become clearer.
With Single Molecule, Real-Time (SMRT) Sequencing and the Sequel Systems, you can easily and affordably sequence complete transcript isoforms in genes of interest or across the entire transcriptome. The Iso-Seq method allows users to generate full-length cDNA sequences up to 10 kb in length — with no assembly required — to confidently characterize full-length transcript isoforms.
With PacBio single-cell RNA sequencing using the Iso-Seq method, you can now distinguish between alternative transcript isoforms at the single-cell level. The highly accurate long reads (HiFi reads) can span the entire 5′ to 3′ end of a transcript, allowing a high-resolution view of isoform diversity and revealing cell-to-cell heterogeneity without the need for assembly.
Discover the benefits of HiFi reads and learn how highly accurate long-read sequencing provides a single technology solution across a range of applications.
Single Molecule Real Time (SMRT) sequencing sensitively detects polyclonal and compound BCR-ABL in patients who relapse on kinase inhibitor therapy.
Secondary kinase domain (KD) mutations are the most well-recognized mechanism of resistance to tyrosine kinase inhibitors (TKIs) in chronic myeloid leukemia (CML) and other cancers. In some cases, multiple drug resistant KD mutations can coexist in an individual patient (“polyclonality”). Alternatively, more than one mutation can occur in tandem on a single allele (“compound mutations”) following response and relapse to sequentially administered TKI therapy. Distinguishing between these two scenarios can inform the clinical choice of subsequent TKI treatment. There is currently no clinically adaptable methodology that offers the ability to distinguish polyclonal from compound mutations. Due to the size of the BCR-ABL KD where TKI-resistant mutations are detected, next-generation platforms are unable to generate reads of sufficient length to determine if two mutations separated by 500 nucleotides reside on the same allele. Pacific Biosciences RS Single Molecule Real-Time (SMRT) circular consensus sequencing technology is a novel third generation deep sequencing technology capable of rapidly and reliably achieving average read lengths of ~1000 bp and frequently beyond 3000 bp, allowing sequencing of the entire ABL KD on single strand of DNA. We sought to address the ability of SMRT sequencing technology to distinguish polyclonal from compound mutations using clinical samples obtained from patients who have relapsed on BCR-ABL TKI treatment.
Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers to understand molecular mechanisms in evolution and gain insight into adaptive strategies. With read lengths exceeding 10 kb, we are able to sequence high-quality, closed microbial genomes with associated plasmids, and investigate large genome complexities, such as long, highly repetitive, low-complexity regions and multiple tandem-duplication events. Improved genome quality, observed at 99.9999% (QV60) consensus accuracy, and significant reduction of gap regions in reference genomes (up to and beyond 50%) allow researchers to better understand coding sequences with high confidence, investigate potential regulatory mechanisms in noncoding regions, and make inferences about evolutionary strategies that are otherwise missed by the coverage biases associated with short- read sequencing technologies. Additional benefits afforded by SMRT Sequencing include the simultaneous capability to detect epigenomic modifications and obtain full-length cDNA transcripts that obsolete the need for assembly. With direct sequencing of DNA in real-time, this has resulted in the identification of numerous base modifications and motifs, which genome-wide profiles have linked to specific methyltransferase activities. Our new offering, the Iso-Seq Application, allows for the accurate differentiation between transcript isoforms that are difficult to resolve with short-read technologies. PacBio reads easily span transcripts such that both 5’/3’ primers for cDNA library generation and the poly-A tail are observed. As such, exon configuration and intron retention events can be analyzed without ambiguity. This technological advance is useful for characterizing transcript diversity and improving gene structure annotations in reference genomes. We review solutions available with SMRT Sequencing, from targeted sequencing efforts to obtaining reference genomes (>100 Mb). This includes strategies for identifying microsatellites and conducting phylogenetic comparisons with targeted gene families. We highlight how to best leverage our long reads that have exceeded 20 kb in length for research investigations, as well as currently available bioinformatics strategies for analysis. Benefits for these applications are further realized with consistent use of size selection of input sample using the BluePippin™ device from Sage Science as demonstrated in our genome improvement projects. Using the latest P5-C3 chemistry on model organisms, these efforts have yielded an observed contig N50 of ~6 Mb, with the longest contig exceeding 12.5 Mb and an average base quality of QV50.
Single Molecule, Real-Time (SMRT) Sequencing provides efficient, streamlined solutions to address new frontiers in plant genomes and transcriptomes. Inherent challenges presented by highly repetitive, low-complexity regions and duplication events are directly addressed with multi- kilobase read lengths exceeding 8.5 kb on average, with many exceeding 20 kb. Differentiating between transcript isoforms that are difficult to resolve with short-read technologies is also now possible. We present solutions available for both reference genome and transcriptome research that best leverage long reads in several plant projects including algae, Arabidopsis, rice, and spinach using only the PacBio platform. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. We will share highlights from our genome projects using the latest P5- C3 chemistry to generate high-quality reference genomes with the highest contiguity, contig N50 exceeding 1 Mb, and average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented for full transcriptome characterization and targeted surveys of genes with complex structures. PacBio provides the most comprehensive assembly with annotation when combining offerings for both genome and transcriptome research efforts. For more focused investigation, PacBio also offers researchers opportunities to easily investigate and survey genes with complex structures.
Second-generation sequencing has brought about tremendous insights into the genetic underpinnings of biology. However, there are many functionally important and medically relevant regions of genomes that are currently difficult or impossible to sequence, resulting in incomplete and fragmented views of genomes. Two main causes are (i) limitations to read DNA of extreme sequence content (GC-rich or AT-rich regions, low complexity sequence contexts) and (ii) insufficient read lengths which leave various forms of structural variation unresolved and result in mapping ambiguities.
Single Molecule, Real-Time sequencing of full-length cDNA transcripts uncovers novel alternatively spliced isoforms.
In higher eukaryotic organisms, the majority of multi-exon genes are alternatively spliced. Different mRNA isoforms from the same gene can produce proteins that have distinct properties such as structure, function, or subcellular localization. Thus, the importance of understanding the full complement of transcript isoforms with potential phenotypic impact cannot be underscored. While microarrays and other NGS-based methods have become useful for studying transcriptomes, these technologies yield short, fragmented transcripts that remain a challenge for accurate, complete reconstruction of splice variants. The Iso-Seq protocol developed at PacBio offers the only solution for direct sequencing of full-length, single-molecule cDNA sequences to survey transcriptome isoform diversity useful for gene discovery and annotation. Knowledge of the complete isoform repertoire is also key for accurate quantification of isoform abundance. As most transcripts range from 1 – 10 kb, fully intact RNA molecules can be sequenced using SMRT Sequencing (avg. read length: 10-15 kb) without requiring fragmentation or post-sequencing assembly. Our open-source computational pipeline delivers high-quality, non-redundant sequences for unambiguous identification of alternative splicing events, alternative transcriptional start sites, polyA tail, and gene fusion events. The standard Iso-Seq protocol workflow available for all researchers is presented using a deep dataset of full- length cDNA sequences from the MCF-7 cancer cell line, and multiple tissues (brain, heart, and liver). Detected novel transcripts approaching 10 kb and alternative splicing events are highlighted. Even in extensively profiled samples, the method uncovered large numbers of novel alternatively spliced isoforms and previously unannotated genes.
SMRT Sequencing of DNA and RNA samples extracted from formalin-fixed and paraffin embedded tissues using adaptive focused acoustics by Covaris.
Recent advances in next-generation sequencing have led to an increased use of formalin-fixed and paraffin-embedded (FFPE) tissues for medical samples in disease and scientific research. Single Molecule, Real-Time (SMRT) Sequencing offers a unique advantage for direct analysis of FFPE samples without amplification. However, obtaining ample long-read information from FFPE samples has been a challenge due to the quality and quantity of the extracted DNA. FFPE samples often contain damaged sites, including breaks in the backbone and missing or altered nucleotide bases, which directly impact sequencing and target enrichment. Additionally, the quality and quantity of the recovered DNA vary depending on the extraction methods used. We have evaluated the Covaris® Adaptive Focused Acoustics (AFA) system as a method for obtaining high molecular weight DNA suitable for SMRTbell™ template preparation and subsequent PacBio RS II sequencing. To test the Covaris system, we extracted DNA from normal kidney FFPE scrolls acquired from the Cooperative Human Tissue Network (CHTN), University of Pennsylvania. Damaged sites in the extracted DNA were repaired using a DNA Damage Repair step, and the treated DNA was constructed into SMRTbell libraries for sequencing on the PacBio System. Using the same repaired DNA, we also tested the efficiency of PCR in amplifying targets of up to 10 kb. The resulting amplicons were also constructed into SMRTbell templates for full-length sequencing on the PacBio System. We found the Adaptive Focused Acoustics (AFA) system by Covaris to be effective. This system is easy and simple to use, and the resulting DNA is compatible with SMRTbell library preparation for targeted and whole genome SMRT Sequencing. The data presented here demonstrates feasibility of SMRT Sequencing with FFPE samples.
The majority of human genes are alternatively spliced, making it possible for most genes to generate multiple proteins. The process of alternative splicing is highly regulated in a developmental-stage and tissue-specific manner. Perturbations in the regulation of these events can lead to disease in humans. Alternative splicing has been shown to play a role in human cancer, muscular dystrophy, Alzheimer’s, and many other diseases. Understanding these diseases requires knowing the full complement of mRNA isoforms. Microarrays and high-throughput cDNA sequencing have become highly successful tools for studying transcriptomes, however these technologies only provide small fragments of transcripts and building complete transcript isoforms has been very challenging. We have developed the Iso-Seq technique, which is capable of sequencing full-length, single-molecule cDNA sequences. The method employs SMRT Sequencing to generate individual molecules with average read lengths of more than 10 kb and some as long as 40 kb. As most transcripts are from 1 to 10 kb, we can sequence through entire RNA molecules, requiring no fragmentation or post-sequencing assembly. Jointly with the sequencing method, we developed a computational pipeline that polishes these full-length transcript sequences into high-quality, non-redundant transcript consensus sequences. Iso-Seq sequencing enables unambiguous identification of alternative splicing events, alternative transcriptional start and poly-A sites, and transcripts from gene fusion events. Knowledge of the complete set of isoforms from a sample of interest is key for accurate quantification of isoform abundance when using any technology for transcriptome studies. Here we characterize the full-length transcriptome of normal human tissues, paired tumor/normal samples from breast cancer, and a brain sample from a patient with Alzheimer’s using deep Iso-Seq sequencing. We highlight numerous discoveries of novel alternatively spliced isoforms, gene-fusions events, and previously unannotated genes that will improve our understanding of human diseases.
PacBio bioinformatician, Elizabeth Tseng, reviews the bioinformatics strategies utilizing PacBio long-read sequencing data for isoform sequencing for full-length transcript sequencing without assembly.
A method for the identification of variants in Alzheimer’s disease candidate genes and transcripts using hybridization capture combined with long-read sequencing
Alzheimer’s disease (AD) is a devastating neurodegenerative disease that is genetically complex. Although great progress has been made in identifying fully penetrant mutations in genes such as APP, PSEN1 and PSEN2 that cause early-onset AD, these still represent a very small percentage of AD cases. Large-scale, genome-wide association studies (GWAS) have identified at least 20 additional genetic risk loci for the more common form of late-onset AD. However, the identified SNPs are typically not the actual risk variants, but are in linkage disequilibrium with the presumed causative variant (Van Cauwenberghe C, et al., The genetic landscape of Alzheimer disease: clinical implications and perspectives. Genet Med 2015;18:421-430). Long-read sequencing together with hybrid-capture targeting technologies provides a powerful combination to target candidate genes/transcripts of interest. Shearing the genomic DNA to ~5 kb fragments and then capturing with probes that span the whole gene(s) of interest can provide uniform coverage across the entire region, identifying variants and allowing for phasing into two haplotypes. Furthermore, capturing full-length cDNA from the same sample using the same capture probes can also provide an understanding of isoforms that are generated and allow them to be assigned to their corresponding haplotype. Here we present a method for capturing genomic DNA and cDNA from an AD sample using a panel of probes targeting approximately 20 late-onset AD candidate genes which includes CLU, ABCA7, CD33, TREM2, TOMM40, PSEN2, APH1 and BIN1. By combining xGen® Lockdown® probes with SMRT Sequencing, we provide completely sequenced candidate genes as well as their corresponding transcripts. In addition, we are also able to evaluate structural variants that due to their size, repetitive nature, or low sequence complexity have been un-sequenceable using short-read technologies.
Over 40% of males and ~16% of female carriers of a FMR1 premutation allele (55-200 CGG repeats) are at risk for developing Fragile X-associated Tremor/Ataxia Syndrome (FXTAS), an adult onset neurodegenerative disorder while, about 20% of female carriers will develop Fragile X-associated Primary Ovarian Insufficiency (FXPOI), in addition to a number of adult-onset clinical problems (FMR1 associated disorders). Marked elevation in FMR1 mRNA levels have been observed with premutation alleles and the resulting RNA toxicity is believed to be the leading molecular mechanism proposed for these disorders. The FMR1 gene, as many housekeeping genes, undergoes alternative splicing. Using long-read isoform sequencing (SMRT) and qRT-PCR we have recently reported that, although the relative abundance of all FMR1 mRNA isoforms is significantly increased in the premutation group compared to controls, there is a disproportionate increase, relative to the overall increase in mRNA, in the abundance of isoforms spliced at both exons 12 and 14. In total, we confirmed the existence of 16 out of 24 predicted isoforms in our samples. However, it is unknown, which isoforms, when overexpressed, may contribute to the premutation pathology. To address this question we have further defined the transcriptional FMR1 isoforms distribution pattern in different tissues, including heart, muscle, brain and testis derived from FXTAS premutation carriers and age-matched controls. Preliminary data indicates the presence of a transcriptional signature of the FMR1 gene, which clusters more by individual than by tissue type. We identified additional isoforms than the 16 reported in our previous study, including a group with particular splice patterns that were observed only in premutations but not in controls. Our findings suggest that the characterization of expression levels of the different FMR1 isoforms is fundamental for understanding the regulation of the FMR1 gene as well as for elucidating the mechanism(s) by which “toxic gain of function” of the FMR1 mRNA may play a role in FXTAS and/or in the other FMR1-associated conditions. In addition to the elevated levels of FMR1 isoforms, the altered abundance/ratio of the corresponding FMRP isomers may affect the overall function of FMRP in premutations.