Learn how Single Molecule, Real-Time (SMRT) Sequencing and the Sequel IIe System and will accelerate your research by delivering highly accurate long reads to provide the most comprehensive view of genomes, transcriptomes and epigenomes.
Resolving Highly Diverse HLA and CYP2D6 Alleles Using HiFi Sequencing for Long-Range Amplicon Data with a New Clustering Algorithm
Targeted amplification of difficult pharmacogenetic loci with PacBio HiFi reads can resolve complex alleles in a single direct assay without imputation.
Learn why it is critically important to understand accuracy in DNA sequencing to distinguish important biological information from sequencing errors.
To bring personalized medicine to all patients, cancer researchers need more reliable and comprehensive views of somatic variants of all sizes that drive cancer biology.
With Single Molecule, Real-Time (SMRT) Sequencing and the Sequel System, you can easily and cost effectively generate highly accurate long reads (HiFi reads, >99% single-molecule accuracy) from genes or regions of interest ranging in size from several hundred base pairs to 20 kb. Target all types of variation across relevant genomic regions, including low complexity regions like repeat expansions, promoters, and flanking regions of transposable elements.
Discover the benefits of HiFi reads and learn how highly accurate long-read sequencing provides a single technology solution across a range of applications.
Single Molecule Real Time (SMRT) sequencing sensitively detects polyclonal and compound BCR-ABL in patients who relapse on kinase inhibitor therapy.
Secondary kinase domain (KD) mutations are the most well-recognized mechanism of resistance to tyrosine kinase inhibitors (TKIs) in chronic myeloid leukemia (CML) and other cancers. In some cases, multiple drug resistant KD mutations can coexist in an individual patient (“polyclonality”). Alternatively, more than one mutation can occur in tandem on a single allele (“compound mutations”) following response and relapse to sequentially administered TKI therapy. Distinguishing between these two scenarios can inform the clinical choice of subsequent TKI treatment. There is currently no clinically adaptable methodology that offers the ability to distinguish polyclonal from compound mutations. Due to the size of the BCR-ABL KD where TKI-resistant mutations are detected, next-generation platforms are unable to generate reads of sufficient length to determine if two mutations separated by 500 nucleotides reside on the same allele. Pacific Biosciences RS Single Molecule Real-Time (SMRT) circular consensus sequencing technology is a novel third generation deep sequencing technology capable of rapidly and reliably achieving average read lengths of ~1000 bp and frequently beyond 3000 bp, allowing sequencing of the entire ABL KD on single strand of DNA. We sought to address the ability of SMRT sequencing technology to distinguish polyclonal from compound mutations using clinical samples obtained from patients who have relapsed on BCR-ABL TKI treatment.
In today’s clinical diagnostic laboratories, the detection of the disease causing mutations is either done through genotyping or Sanger sequencing. Whether done singly or in a multiplex assay, genotyping works only if the exact molecular change is known. Sanger sequencing is the gold standard method that captures both known and novel molecular changes in the disease gene of interest. Most clinical Sanger sequencing assays involve PCR-amplifying the coding sequences of the disease target gene followed by bi-directional sequencing of the amplified products. Therefore for every patient sample, one generates multiple amplicons singly and each amplicon leads to two separate sequencing reactions. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to Sanger sequencing including long read lengths, first-in-first-out processing, fast time to result, high-levels of multiplexing and substantially reduced costs. For our first proof-of-concept experiment, we queried 3 known disease-associated mutations in de-identified clinical samples. We started off with 3 autosomal recessive diseases found at an increased frequency in the Ashkenazi Jewish population: Tay Sachs disease, Niemann-Pick disease and Canavan disease. The mutated gene in Tays Sachs is HEXA, Niemann-Pick is SMPD1 and Canavan is ASPA. Coding exons were amplified in multiple (6-13) amplicons for each gene from both non-carrier and carriers. Amplicons were purified, concentrations normalized, and combined prior to SMRTbell™ Library prep. A single SMRTbell library was sequenced for each gene from each patient using standard Pacific Biosciences C2 chemistry and protocols. Average read lengths of 4,000 bp across samples allowed for high-quality Circular Consensus Sequences (CCS) across all amplicons (all less than 1 kb). This high quality CCS data permitted the clean partitioning of reads from a patient in the presence of heterozygous events. Using non-carrier sequencing as a control, we were able to correctly identify the known events in carrier genes. This suggests the potential utility of SMRT sequencing in a clinical setting, enabling a cost-effective method of replacing targeted mutation detection with sequencing of the entire gene.
Allele-level sequencing and phasing of full-length HLA class I and II genes using SMRT Sequencing technology
The three classes of genes that comprise the MHC gene family are actively involved in determining donor-recipient compatibility for organ transplant, as well as susceptibility to autoimmune diseases via cross-reacting immunization. Specifically, Class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DQ and -DP are considered medically important for genetic analysis to determine histocompatibility. They are highly polymorphic and have thousands of alleles implicated in disease resistance and susceptibility. The importance of full-length HLA gene sequencing for genotyping, detection of null alleles, and phasing is now widely acknowledged. While DNA-sequencing-based HLA genotyping has become routine, only 7% of the HLA genes have been characterized by allele-level sequencing, while 93% are still defined by partial sequences. The gold-standard Sanger sequencing technology is being quickly replaced by second-generation, high- throughput sequencing methods due to its inability to generate unambiguous phased reads from heterozygous alleles. However, although these short, high-throughput, clonal sequencing methods are better at heterozygous allele detection, they are inadequate at generating full-length haploid gene sequences. Thus, full-length gene sequencing from an enhancer-promoter region to a 3’UTR that includes phasing information without the need for imputation still remains a technological challenge. The best way to overcome these challenges is to sequence these genes with a technology that is clonal in nature and has the longest possible read lengths. We have employed Single Molecule Real-Time (SMRT) sequencing technology from Pacific Biosciences for sequencing full-length HLA class I and II genes.
A novel analytical pipeline for de novo haplotype phasing and amplicon analysis using SMRT Sequencing technology.
While the identification of individual SNPs has been readily available for some time, the ability to accurately phase SNPs and structural variation across a haplotype has been a challenge. With individual reads of an average length of 9 kb (P5-C3), and individual reads beyond 30 kb in length, SMRT Sequencing technology allows the identification of mutation combinations such as microdeletions, insertions, and substitutions without any predetermined reference sequence. Long- amplicon analysis is a novel protocol that identifies and reports the abundance of differing clusters of sequencing reads within a single library. Graphs generated via hierarchical clustering of individual sequencing reads are used to generate Markov models representing the consensus sequence of individual clusters found to be significantly different. Long-amplicon analysis is capable of differentiating between underlying sequences that are 99.9% similar, which is suitable for haplotyping and differentiating pseudogenes from coding transcripts. This protocol allows for the identification of structural variation in the MUC5AC gene sequence, despite the presence of a gap in the current genome assembly, and can also be used for HLA haplotyping. Clustering can also been applied to identify full length transcripts for the purpose of estimating consensus sequences and enumerating isoform types. Long-amplicon analysis allows for the elucidation of complex regions otherwise missed by other sequencing technologies, which may contribute to the diagnosis and understanding of otherwise complex diseases.
Genomic DNA sequences of HLA class I alleles generated using multiplexed barcodes and SMRT DNA Sequencing technology.
Allelic-level resolution HLA typing is known to improve survival prognoses post Unrelated Donor (UD) Haematopoietic Stem Cell Transplantation (HSCT). Currently, many commonly used HLA typing methodologies are limited either due to the fact that ambiguity cannot be resolved or that they are not amenable to high-throughput laboratories. Pacific Biosciences’ Single Molecule Real-Time (SMRT) DNA sequencing technology enables sequencing of single molecules in isolation and has read-length capabilities to enable whole gene sequencing for HLA. DNA barcode technology labels samples with unique identifiers that can be traced throughout the sequencing process. The use of DNA barcodes means that multiple samples can be sequenced in a single experiment but data can still be attributed to the correct sample. Here we describe the results of experiments that use DNA barcodes to facilitate sequencing of multiple samples for full-length HLA class I genes (known as multiplexing).
HLA sequencing using SMRT Technology – High resolution and high throughput HLA genotyping in a clinical setting
Sequence based typing (SBT) is considered the gold standard method for HLA typing. Current SBT methods are rather laborious and are prone to phase ambiguity problems and genotyping uncertainties. As a result, the NGS community is rapidly seeking to remedy these challenges, to produce high resolution and high throughput HLA sequencing conducive to a clinical setting. Today, second generation NGS technologies are limited in their ability to yield full length HLA sequences required for adequate phasing and identification of novel alleles. Here we present the use of single molecule real time (SMRT) sequencing as a means of determining full length/long HLA sequences. Moreover we reveal the scalability of this method through multiplexing approches and determine HLA genotyping calls through the use of third party Gendx NGSengine® software.
Long Amplicon Analysis: Highly accurate, full-length, phased, allele-resolved gene sequences from multiplexed SMRT Sequencing data.
The correct phasing of genetic variations is a key challenge for many applications of DNA sequencing. Allele-level resolution is strongly preferred for histocompatibility sequencing where recombined genes can exhibit different compatibilities than their parents. In other contexts, gene complementation can provide protection if deleterious mutations are found on only one allele of a gene. These problems are especially pronounced in immunological domains given the high levels of genetic diversity and recombination seen in regions like the Major Histocompatibility Complex. A new tool for analyzing Single Molecule, Real-Time (SMRT) Sequencing data – Long Amplicon Analysis (LAA) – can generate highly accurate, phased and full-length consensus sequences for multiple genes in a single sequencing run.
Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facilitated by SMRT Sequencing technology. In the present work, we have evaluated multiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell adapters. Eight different 16-bp barcode sequences were used in symmetric & asymmetric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRT Cell. Amplicons generated from barcoded primers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 samples and also allowed generation of unique asymmetric pairings for simultaneous amplification from 28 reference genomic DNA samples. The data generated from all 3 methods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software.
Multiplexing human HLA class I & II genotyping with DNA barcode adapters for high throughput research.
Human MHC class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DP and -DQ, play a critical role in the immune system as major factors responsible for organ transplant rejection. The have a direct or linkage-based association with several diseases, including cancer and autoimmune diseases, and are important targets for clinical and drug sensitivity research. HLA genes are also highly polymorphic and their diversity originates from exonic combinations as well as recombination events. A large number of new alleles are expected to be encountered if these genes are sequenced through the UTRs. Thus allele-level resolution is strongly preferred when sequencing HLA genes. Pacific Biosciences has developed a method to sequence the HLA genes in their entirety within the span of a single read taking advantage of long read lengths (average >10 kb) facilitated by SMRT technology. A highly accurate consensus sequence (=99.999 or QV50 demonstrated) is generated for each allele in a de novo fashion by our SMRT Analysis software. In the present work, we have combined this imputation-free, fully phased, allele-specific consensus sequence generation workflow and a newly developed DNA-barcode-tagged SMRTbell sample preparation approach to multiplex 96 individual samples for sequencing all of the HLA class I and II genes. Commercially available NGS-go reagents for full-length HLA class I and relevant exons of class II genes were amplified for hi-resolution HLA sequencing. The 96 samples included 72 that are part of UCLA reference panel and had pre-typing information available for 2 fields, based on gold standard SBT methods. SMRTbell adapters with 16 bp barcode tags were ligated to long amplicons in symmetric pairing. PacBio sequencing was highly effective in generating accurate, phased sequences of full-length alleles of HLA genes. In this work we demonstrate scalability of HLA sequencing using off the shelf assays for research applications to find biological significance in full-length sequencing.