PacBio RS II sequencing chemistries provide read lengths beyond 20 kb with high consensus accuracy. The long read lengths of P4-C2 chemistry and demonstrated consensus accuracy of 99.999% are ideal for applications such as de novo assembly, targeted sequencing and isoform sequencing. The recently launched P5-C3 chemistry generates even longer reads with N50 often >10,000 bp, making it the best choice for scaffolding and spanning structural rearrangements. With these chemistry advances, PacBio’s read length performance is now primarily determined by the SMRTbell library itself. Size selection of a high-quality, sheared 20 kb library using the BluePippin™ System has been demonstrated to increase the N50 read length by as much as 5 kb with C3 chemistry. BluePippin size selection or a more stringent AMPure® PB selection cutoff can be used to recover long fragments from degraded genomic material. The selection of chemistries, P4-C2 versus P5-C3, is highly dependent on the final size distribution of the SMRTbell library and experimental goals. PacBio’s long read lengths also allow for the sequencing of full-length cDNA libraries at single-molecule resolution. However, longer transcripts are difficult to detect due to lower abundance, amplification bias, and preferential loading of smaller SMRTbell constructs. Without size selection, most sequenced transcripts are 1-1.5 kb. Size selection dramatically increases the number of transcripts >1.5 kb, and is essential for >3 kb transcripts.
Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing.
Colorectal cancer (CRC) represents one of the most prevalent and lethal malignant neoplasms and every individual of age 50 and above should undergo regular CRC screening. Currently, the most effective procedure to detect adenomas, the precursors to CRC, is colonoscopy, which reduces CRC incidence by 80%. However, it is an invasive approach that is unpleasant for the patient, expensive, and poses some risk of complications such as colon perforation. A non-invasive screening approach with detection rates comparable to those of colonoscopy has not yet been established. The current study applies Pacific Biosciences third generation, single molecule sequencing to the inspection of CRC-driving mutations. Our approach combines the screening power and the extremely high accuracy of circular consensus (CCS) third generation sequencing with the non-invasiveness of using stool DNA to detect CRC-associated mutations present at extremely low frequencies and establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and early stage adenomas. We performed a series of experiments using a pool of fifteen amplicons covering the genes most frequently mutated in CRC (APC, Beta Catenin, KRAS, BRAF, and TP53), ensuring a theoretical screening coverage of over 97% for both CRC and adenomas. The assay was able to detect mutations in DNA isolated from stool samples from patients diagnosed with CRC at frequencies below 0.5 % with no false positives. The mutations were then confirmed by sequencing DNA isolated from the excised tumor samples. Our assay should be sensitive enough to allow the early identification of adenomatous polyps using stool DNA as analyte. In conclusion, we have developed an assay to detect mutations in the genes associated with CRC and adenomas using Pacific Biosciences RS Single Molecule, Real Time Circular Consensus Sequencing (SMRT-CCS). With no systematic bias and a much higher raw base-calling quality (CCS) compared to other sequencing methods, the assay was able to detect mutations in stool DNA at frequencies below 0.5 % with no false positives. This level of sensitivity should be sufficient to allow the detection of most adenomatous polyps using stool DNA as analyte, a feature that would make our approach the first non-invasive assay with a sensitivity comparable to that of colonoscopy and a strong candidate for the non-invasive preventive CRC screening of the general population.
An improved circular consensus algorithm with an application to detection of HIV-1 Drug-Resistance Associated Mutations (DRAMs)
Scientists who require confident resolution of heterogeneous populations across complex regions have been unable to transition to short-read sequencing methods. They continue to depend on Sanger Sequencing despite its cost and time inefficiencies. Here we present a new redesigned algorithm that allows the generation of circular consensus sequences (CCS) from individual SMRT Sequencing reads. With this new algorithm, dubbed CCS2, it is possible to reach arbitrarily high quality across longer insert lengths at a lower cost and higher throughput than Sanger Sequencing. We apply this new algorithm, dubbed CCS2, to the characterization of the HIV-1 K103N drug-resistance associated mutation, which is both important clinically, and represents a challenge due to regional sequence context. A mutation was introduced into the 3rd position of amino acid position 103 (A>C substitution) of the RT gene on a pNL4-3 backbone by site-directed mutagenesis. Regions spanning ~1,300 bp were PCR amplified from both the non-mutated and mutant (K103N) plasmids, and were sequenced individually and as a 50:50 mixture. Sequencing data were analyzed using the new CCS2 algorithm, which uses a fully-generative probabilistic model of our SMRT Sequencing process to polish consensus sequences to arbitrarily high accuracy. This result, previously demonstrated for multi-molecule consensus sequences with the Quiver algorithm, is made possible by incorporating per-Zero Mode Waveguide (ZMW) characteristics, thus accounting for the intrinsic changes in the sequencing process that are unique to each ZMW. With CCS2, we are able to achieve a per-read empirical quality of QV30 with 19X coverage. This yields ~5000 1.3 kb consensus sequences with a collective empirical quality of ~QV40. Additionally, we demonstrate a 0% miscall rate in both unmixed samples, and estimate a 48:52% frequency for the K103N mutation in the mixed sample, consistent with data produced by orthogonal platforms.
Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing
Next-Generation Sequencing (NGS) technologies allow for molecular profiling of cancer samples with high sensitivity and speed at reduced cost. For efficient profiling of cancer samples, it is important that the NGS methods used are not only robust, but capable of accurately detecting low-frequency somatic mutations. Single Molecule, Real-Time (SMRT) Sequencing offers several advantages, including the ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach. The availability of genetically defined, human genomic reference standards provides an industry standard for the development and quality control of molecular assays for studying cancer variants. Here we characterize SMRT Sequencing for the detection of low-frequency somatic variants using the Quantitative Multiplex DNA Reference Standards from Horizon Discovery, combined with amplification of the variants using the Multiplicom Tumor Hotspot MASTR Plus assay. First, we sequenced a reference standard containing precise allelic frequencies from 1% to 24.5% for major oncology targets verified using digital PCR. This reference material recapitulates the complexity of tumor composition and serves as a well-characterized control. The control sample was amplified using the Multiplicom Tumor Hotspot MASTR Plus assay that targets 252 amplicons (121-254 bp) from 26 relevant cancer genes, which includes all 11 variants in the control sample. Next, we sequenced control samples prepared by SeraCare Life Sciences, which contained a defined mutation at allelic frequencies from 10% down to 0.1%. The wild type and mutant amplicons were serially diluted, sequenced and analyzed using SMRT Sequencing to identify the variants and determine the observed frequency. The random error profile and high-accuracy CCS reads make it possible to accurately detect low-frequency somatic variants.
T-cells play a central part in the immune response in humans and related species. T-cell receptors (TCRs), heterodimers located on the T-cell surface, specifically bind foreign antigens displayed on the MHC complex of antigen-presenting cells. The wide spectrum of potential antigens is addressed by the diversity of TCRs created by V(D)J recombination. Profiling this repertoire of TCRs could be useful from, but not limited to, diagnosis, monitoring response to treatments, and examining T-cell development and diversification.
Detection of somatic mutations, especially in heterogeneous tumor samples where variants may be present at a low level, is challenging. Single Molecule, Real-Time (SMRT) Sequencing is ideal for minor variant detection because of its ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach.
Targeted sequencing with Sanger as well as short read based high throughput sequencing methods is standard practice in clinical genetic testing. However, many applications beyond SNP detection have remained somewhat obstructed due to technological challenges. With the advent of long reads and high consensus accuracy, SMRT Sequencing overcomes many of the technical hurdles faced by Sanger and NGS approaches, opening a broad range of untapped clinical sequencing opportunities. Flexible multiplexing options, highly adaptable sample preparation method and newly improved two well-developed analysis methods that generate highly-accurate sequencing results, make SMRT Sequencing an adept method for clinical grade targeted sequencing. The Circular Consensus Sequencing (CCS) analysis pipeline produces QV 30 data from each single intra-molecular multi-pass polymerase read, making it a reliable solution for detecting minor variant alleles with frequencies as low as 1 %. Long Amplicon Analysis (LAA) makes use of insert spanning full-length subreads originating from multiple individual copies of the target to generate highly accurate and phased consensus sequences (>QV50), offering a unique advantage for imputation free allele segregation and haplotype phasing. Here we present workflows and results for a range of SMRT Sequencing clinical applications. Specifically, we illustrate how the flexible multiplexing options, simple sample preparation methods and new developments in data analysis tools offered by PacBio in support of Sequel System 5.1 can come together in a variety of experimental designs to enable applications as diverse as high throughput HLA typing, mitochondrial DNA sequencing and viral vector integrity profiling of recombinant adeno-associated viral genomes (rAAV).
High-throughput NGS methods are increasingly utilized in the clinical genomics market. However, short-read sequencing data continues to remain challenged by mapping inaccuracies in low complexity regions or regions of high homology and may not provide adequate coverage within GC-rich regions of the genome. Thus, the use of Sanger sequencing remains popular in many clinical sequencing labs as the gold standard approach for orthogonal validation of variants and to interrogate missed regions poorly covered by second-generation sequencing. The use of Sanger sequencing can be less than ideal, as it can be costly for high volume assays and projects. Additionally, Sanger sequencing generates read lengths shorter than the region of interest, which limits its ability to accurately phase allelic variants. High-throughput SMRT Sequencing overcomes the challenges of both the first- and second-generation sequencing methods. PacBio’s long read capability allows sequencing of full-length amplicons
Structural variant detection with long read sequencing reveals driver and passenger mutations in a melanoma cell line
Past large scale cancer genome sequencing efforts, including The Cancer Genome Atlas and the International Cancer Genome Consortium, have utilized short-read sequencing, which is well-suited for detecting single nucleotide variants (SNVs) but far less reliable for detecting variants larger than 20 base pairs, including insertions, deletions, duplications, inversions and translocations. Recent same-sample comparisons of short- and long-read human reference genome data have revealed that short-read resequencing typically uncovers only ~4,000 structural variants (SVs, =50 bp) per genome and is biased towards deletions, whereas sequencing with PacBio long-reads consistently finds ~20,000 SVs, evenly balanced between insertions and deletions. This discovery has important implications for cancer research, as it is clear that SVs are both common and biologically important in many cancer subtypes, including colorectal, breast and ovarian cancer. Without confident and comprehensive detection of structural variants, it is unlikely we have a sufficiently complete picture of all the genomic changes that impact cancer development, disease progression, treatment response, drug resistance, and relapse. To begin to address this unmet need, we have sequenced the COLO829 tumor and matched normal lymphoblastoid cell lines to 49- and 51-fold coverage, respectively, with PacBio SMRT Sequencing, with the goal of developing a high-confidence structural variant call set that can be used to empirically evaluate cost-effective experimental designs for larger scale studies and develop structural variation calling software suitable for cancer genomics. Structural variant calling revealed over 21,000 deletions and 19,500 insertions larger than 20 bp, nearly four times the number of events detected with short-read sequencing. The vast majority of events are shared between the tumor and normal, with about 100 putative somatic deletions and 400 insertions, primarily in microsatellites. A further 40 rearrangements were detected, nearly exclusively in the tumor. One rearrangement is shared between the tumor and normal, t(5;X) which disrupts the mismatch repeat gene MSH3, and is likely a driver mutation. Generating high-confidence call sets that cover the entire size-spectrum of somatic variants from a range of cancer model systems is the first step in determining what will be the best approach for addressing an ongoing blind spot in our current understanding of cancer genomes. Here the application of PacBio sequencing to a melanoma cancer cell line revealed thousands of previously overlooked variants, including a mutation likely involved in tumorogenesis.
Sequencing the previously unsequenceable using amplification-free targeted enrichment powered by CRISPR/Cas9
Genomic regions with extreme base composition bias and repetitive sequences have long proven challenging for targeted enrichment methods, as they rely upon some form of amplification. Similarly, most DNA sequencing technologies struggle to faithfully sequence regions of low complexity. This has especially been true for repeat expansion disorders such as Fragile X syndrome, Huntington’s disease and various Ataxias, where the repetitive elements range from several hundreds of bases to tens of kilobases. We have developed a robust, amplification-free targeted enrichment technique, called No-Amp Targeted Sequencing, that employs the CRISPR/Cas9 system. In conjunction with Single Molecule, Real-Time (SMRT) Sequencing, which delivers long reads spanning the entire repeat expansion, high consensus accuracy, and uniform coverage, these previously inaccessible regions are now accessible. This method is completely amplification-free, therefore removing any PCR errors and biases from the experiment. Furthermore, this technique also preserves native DNA molecules, allowing for direct detection and characterization of epigenetic signatures. The No-Amp method is a two-day protocol, compatible with multiplexing of multiple targets and samples in a single reaction, using as little as 1 µg of genomic DNA input per sample. We have successfully targeted a number of repeat expansion disorder loci (HTT, FMR1, ATXN10, C9orf72) with alleles as long as >2700 repeat unites (>13 kb). Using the No-Amp method we have isolated hundreds of individual on-target molecules, allowing for reliable repeat size estimation, mosaicism detection and identification of interruption sequences – all aspects of repeat expansion disorders which are important for better understanding the underlying disease mechanisms.
Amplification-free targeted enrichment powered by CRISPR-Cas9 and long-read Single Molecule Real-Time (SMRT) Sequencing can efficiently and accurately sequence challenging repeat expansion disorders
Genomic regions with extreme base composition bias and repetitive sequences have long proven challenging for targeted enrichment methods, as they rely upon some form of amplification. Similarly, most DNA sequencing technologies struggle to faithfully sequence regions of low complexity. This has been especially trying for repeat expansion disorders such as Fragile-X disease, Huntington disease and various Ataxias, where the repetitive elements range from several hundreds of bases to tens of kilobases. We have developed a robust, amplification-free targeted enrichment technique, called No-Amp Targeted Sequencing, that employs the CRISPR-Cas9 system. In conjunction with SMRT Sequencing, which delivers long reads spanning the entire repeat expansion, high consensus accuracy, and uniform coverage, these previously inaccessible regions are now accessible. This method is completely amplification-free, therefore removing any PCR errors and biases from the experiment. Furthermore, this technique also preserves native DNA molecules, allowing for direct detection and characterization of epigenetic signatures. The No-Amp method is a two-day protocol that is compatible with multiplexing of multiple targets and multiple samples in a single reaction, using as little as 1 µg of genomic DNA input per sample. We have successfully targeted a number of repeat expansion disorder loci including HTT, FMR1, C9orf7,2 as well as built an Ataxia panel which consists of 15 different disease-causing repeat expansion regions. Using the No-Amp method we have isolated hundreds of individual on-target molecules, allowing for reliable repeat size estimation, mosaicism detection and identification of interruption sequences with alleles as long as >2700 repeat unites ( >13 kb). In addition to multiplexing several targets, we have also multiplexed at least 20 samples in one experiment making the No-Amp Targeted Sequencing method a cost-effective option. Combining the CRISPR-Cas9 enrichment method with Single Molecule, Real-Time Sequencing provided us with base-level resolution of previously inaccessible regions of the genome, like disease-causing repeat expansions. No-Amp Targeted Sequencing captures, in one experiment, many aspects of repeat expansion disorders which are important for better understanding the underlying disease mechanisms.
This webinar, presented by Nisha Pillai, provides an overview of amplicon sequencing to target specific regions of a genome using PacBio Single Molecule, Real-Time (SMRT) Sequencing. This session provides an…
In this webinar, Jonas Korlach, PacBio Chief Scientific Officer, and Dave Corney, Associate Principal Scientist, Next Generation Sequencing from GENEWIZ, describe the recent release of Sequel System 6.0, which has…
In this webinar, Lori Aro and Cheryl Heiner of PacBio describe how high-throughput amplicon sequencing using Single Molecule, Real-Time (SMRT) Sequencing and the Sequel System allows for the easy and…
In this AGBT presentation, Marty Badgett shares a look at the latest results from circular consensus sequencing (CCS) mode for highly accurate reads and data from our soon-to-be-released Sequel II…