Learn how Single Molecule, Real-Time (SMRT) Sequencing and the Sequel IIe System and will accelerate your research by delivering highly accurate long reads to provide the most comprehensive view of genomes, transcriptomes and epigenomes.
The Environmental and Agronomical Genomics 2021 symposium is jointly organised by France Génomique (FG) and the GDR Génomique Environnementale (GE). The Environmental and Agronomical Genomics 2021 symposium will be the opportunity to…
Join us for this webinar to understand the evolution of sequencing technologies and where they stand today, the experiences of three plant biologists incorporating sequencing into their work, and how…
Explore a list of PacBio certified service providers.
Interested to learn about pangenomes? Explore this guide to learn how they provide a more complete picture of the core genes of a given species and how that can provide better biological understanding.
The study of genomics has revolutionized our understanding of science, but the field of transcriptomics grew with the need to explore the functional impacts of genetic variation. While different tissues in an organism may share the same genomic DNA, they can differ greatly in what regions are transcribed into RNA and in their patterns of RNA processing. By reviewing the history of transcriptomics, we can see the advantages of RNA sequencing using a full-length transcript approach become clearer.
With Single Molecule, Real-Time (SMRT) Sequencing and the Sequel Systems, you can easily and affordably sequence complete transcript isoforms in genes of interest or across the entire transcriptome. The Iso-Seq method allows users to generate full-length cDNA sequences up to 10 kb in length — with no assembly required — to confidently characterize full-length transcript isoforms.
PacBio HiFi reads provide both long read lengths (up to 25 kb) and high accuracy (>99.9%) to quickly and affordably generate contiguous, complete, and correct de novo genome assemblies of even the most complex genomes.
AGBT 2013 Presentation Slides: Cold Spring Harbor Laboratory’s Michael Schatz presented strategies for de novo assembly of crop genomes with PacBio technolgy.
Isoform sequencing: Unveiling the complex landscape of the eukaryotic transcriptome on the PacBio RS II.
Alternative splicing of RNA is an important mechanism that increases protein diversity and is pervasive in the most complex biological functions. While advances in RNA sequencing methods have accelerated our understanding of the transcriptome, isoform discovery remains computationally challenging due to short read lengths. Here, we describe the Isoform Sequencing (Iso-Seq) method using long reads generated by the PacBio RS II. We sequenced rat heart and lung RNA using the Clontech® SMARTer® cDNA preparation kit followed by size selection using agarose gel. Additionally, we tested the BluePippin™ device from Sage Science for efficiently extracting longer transcripts = 3 kb. Post-sequencing, we developed a novel isoform-level clustering algorithm to generate high-quality transcript consensus sequences. We show that our method recovered alternative splice forms as well as alternative stop sites, antisense transcription, and retained introns. To conclude, the Iso-Seq method provides a new opportunity for researchers to study the complex eukaryotic transcriptome even in the absence of reference genomes or annotated transcripts.
PacBio RS II sequencing chemistries provide read lengths beyond 20 kb with high consensus accuracy. The long read lengths of P4-C2 chemistry and demonstrated consensus accuracy of 99.999% are ideal for applications such as de novo assembly, targeted sequencing and isoform sequencing. The recently launched P5-C3 chemistry generates even longer reads with N50 often >10,000 bp, making it the best choice for scaffolding and spanning structural rearrangements. With these chemistry advances, PacBio’s read length performance is now primarily determined by the SMRTbell library itself. Size selection of a high-quality, sheared 20 kb library using the BluePippin™ System has been demonstrated to increase the N50 read length by as much as 5 kb with C3 chemistry. BluePippin size selection or a more stringent AMPure® PB selection cutoff can be used to recover long fragments from degraded genomic material. The selection of chemistries, P4-C2 versus P5-C3, is highly dependent on the final size distribution of the SMRTbell library and experimental goals. PacBio’s long read lengths also allow for the sequencing of full-length cDNA libraries at single-molecule resolution. However, longer transcripts are difficult to detect due to lower abundance, amplification bias, and preferential loading of smaller SMRTbell constructs. Without size selection, most sequenced transcripts are 1-1.5 kb. Size selection dramatically increases the number of transcripts >1.5 kb, and is essential for >3 kb transcripts.
Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers in large genome complexities, such as long, highly repetitive, low-complexity regions and duplication events, and differentiating between transcript isoforms that are difficult to resolve with short-read technologies. We present solutions available for both reference genome improvement (>100 MB) and transcriptome research to best leverage long reads that have exceeded 20 Kb in length. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. Highlights from our genome assembly projects using the latest P5-C3 chemistry on model organisms will be shared. Assembly contig N50 have exceeded 6 Mb and we observed longest contig exceeding 12.5 Mb with an average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq Application will be presented.
A comparison of assemblers and strategies for complex, large-genome sequencing with PacBio long reads.
PacBio sequencing holds promise for addressing large-genome complexities, such as long, highly repetitive, low-complexity regions and duplication events that are difficult to resolve with short-read technologies. Several strategies, with varying outcomes, are available for de novo sequencing and assembling of larger genomes. Using a diploid fungal genome, estimated to be ~80 Mb in size, as the basis dataset for comparison, we highlight assembly options when using only PacBio sequencing or a combined strategy leveraging data sets from multiple sequencing technologies. Data generated from SMRT Sequencing was subjected to assembly using different large-genome assemblers, and comparisons of the results will be shown. These include results generated with HGAP, Celera Assembler, MIRA, PBJelly, and other assembly tools currently in development. Improvements observed include a near 50% reduction in the number of contigs coupled with at least a doubling of contig N50 size in genome assemblies incorporating SMRT Sequencing data. We further show how incorporating long reads also highlights new challenges and missed insights of short-read assemblies arising from heterozygosity inherent in multiploid genomes.
Isoform sequencing: Unveiling the complex landscape in eukaryotic transcriptome on the PacBio RS II.
Advances in RNA sequencing have accelerated our understanding of the transcriptome, however isoform discovery remains challenging due to short read lengths. The Iso-Seq Application provides a new alternative to sequence full-length cDNA libraries using long reads from the PacBio RS II. Identification of long and often rare isoforms is demonstrated with rat heart and lung RNA prepared using the Clontech® SMARTer® cDNA preparation kit, followed by agarose-gel size selection in fractions of 1-2 kb, 2-3 kb and 3-6 kb. For each tissue, 1.8 and 1.2 million reads were obtained from 32 and 26 SMRT Cells, respectively. Filtering for reads with both adapters and polyA tail signals yielded >50% putative full-length transcripts. To improve consensus accuracy, we developed an isoform-level clustering algorithm ICE (Iterative Clustering for Error Correction), and polished full-length consensus sequences from ICE using Quiver. This method generated full-length transcripts up to 4.5 kb with = 99% post-correction accuracy. Compared with known rat genes, the Iso-Seq method not only recovered the majority of currently annotated isoforms, but also several unannotated novel isoforms with identified homologs in the RefSeq database. Additionally, alternative stop sites, extended UTRs, and retained introns were detected.
Third generation single molecule sequencing technology from Pacific Biosciences, Moleculo, Oxford Nanopore, and other companies are revolutionizing genomics by enabling the sequencing of long, individual molecules of DNA and RNA. One major advantage of these technologies over current short read sequencing is the ability to sequence much longer molecules, thousands or tens of thousands of nucleotides instead of mere hundreds. This capacity gives researchers substantially greater power to probe into microbial, plant, and animal genomes, but it remains unknown on how to best use these data. To answer this, we systematically evaluated the human genome and 25 other important genomes across the tree of life ranging in size from 1Mbp to 3Gbp in an attempt to answer how long the reads need to be and how much coverage is necessary to completely assemble their chromosomes with single molecule sequencing. We also present a novel error correction and assembly algorithm using a combination of PacBio and pre-assembled Illumina sequencing. This new algorithm greatly outperforms other published hybrid algorithms.