June 1, 2021  |  

Single Molecule Real Time (SMRT) sequencing sensitively detects polyclonal and compound BCR-ABL in patients who relapse on kinase inhibitor therapy.

Secondary kinase domain (KD) mutations are the most well-recognized mechanism of resistance to tyrosine kinase inhibitors (TKIs) in chronic myeloid leukemia (CML) and other cancers. In some cases, multiple drug resistant KD mutations can coexist in an individual patient (“polyclonality”). Alternatively, more than one mutation can occur in tandem on a single allele (“compound mutations”) following response and relapse to sequentially administered TKI therapy. Distinguishing between these two scenarios can inform the clinical choice of subsequent TKI treatment. There is currently no clinically adaptable methodology that offers the ability to distinguish polyclonal from compound mutations. Due to the size of the BCR-ABL KD where TKI-resistant mutations are detected, next-generation platforms are unable to generate reads of sufficient length to determine if two mutations separated by 500 nucleotides reside on the same allele. Pacific Biosciences RS Single Molecule Real-Time (SMRT) circular consensus sequencing technology is a novel third generation deep sequencing technology capable of rapidly and reliably achieving average read lengths of ~1000 bp and frequently beyond 3000 bp, allowing sequencing of the entire ABL KD on single strand of DNA. We sought to address the ability of SMRT sequencing technology to distinguish polyclonal from compound mutations using clinical samples obtained from patients who have relapsed on BCR-ABL TKI treatment.


June 1, 2021  |  

SMRT Sequencing of whole mitochondrial genomes and its utility in association studies of metabolic disease.

In this study we demonstrate the utility of Single-Molecule Real Time SMRT sequencing to detect variants and to recapitulate whole mitochondrial genomes in an association study of Metabolic syndrome using samples from a well-studied cohort from Micronesia. The Micronesian island of Kosrae is a rare genetic isolate that offers significant advantages for genetic studies of human disease. Kosrae suffers from one of the highest rates of MetS (41%), obesity (52%), and diabetes (17%) globally and has a homogeneous environment making this an excellent population in which to study these significant health problems. We are conducting family-based association analyses aimed at identifying specific mitochondrial variants that contribute to obesity and other co-morbid conditions. We sequenced whole mitochondrial genomes from 10 Kosraen individuals who represent greater than 25 % of the mitochondrial genetic diversity for the entire Kosraen population. Using Pacific Biosciences C2 chemistry, SMRTbell libraries were constructed from pooled, full-length, unsheared 5 kb PCR amplicons, tiling the entire 16.6 kb mtDNA genome. Average read lengths for each sample were between 2500-3000 bp, with 5% of reads between 6,000-8,000 bases, depending on movie lengths. The data generated in this study serve as proof of principle that SMRT Sequencing data can be utilized for identification of high-quality variants and complete mitochondrial genome sequences. These data will be leveraged to identify causative variants for Metabolic syndrome and associated disorders.


June 1, 2021  |  

Single Molecule Real-Time (SMRT) Sequencing of genes implicated in autosomal recessive diseases.

In today’s clinical diagnostic laboratories, the detection of the disease causing mutations is either done through genotyping or Sanger sequencing. Whether done singly or in a multiplex assay, genotyping works only if the exact molecular change is known. Sanger sequencing is the gold standard method that captures both known and novel molecular changes in the disease gene of interest. Most clinical Sanger sequencing assays involve PCR-amplifying the coding sequences of the disease target gene followed by bi-directional sequencing of the amplified products. Therefore for every patient sample, one generates multiple amplicons singly and each amplicon leads to two separate sequencing reactions. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to Sanger sequencing including long read lengths, first-in-first-out processing, fast time to result, high-levels of multiplexing and substantially reduced costs. For our first proof-of-concept experiment, we queried 3 known disease-associated mutations in de-identified clinical samples. We started off with 3 autosomal recessive diseases found at an increased frequency in the Ashkenazi Jewish population: Tay Sachs disease, Niemann-Pick disease and Canavan disease. The mutated gene in Tays Sachs is HEXA, Niemann-Pick is SMPD1 and Canavan is ASPA. Coding exons were amplified in multiple (6-13) amplicons for each gene from both non-carrier and carriers. Amplicons were purified, concentrations normalized, and combined prior to SMRTbell™ Library prep. A single SMRTbell library was sequenced for each gene from each patient using standard Pacific Biosciences C2 chemistry and protocols. Average read lengths of 4,000 bp across samples allowed for high-quality Circular Consensus Sequences (CCS) across all amplicons (all less than 1 kb). This high quality CCS data permitted the clean partitioning of reads from a patient in the presence of heterozygous events. Using non-carrier sequencing as a control, we were able to correctly identify the known events in carrier genes. This suggests the potential utility of SMRT sequencing in a clinical setting, enabling a cost-effective method of replacing targeted mutation detection with sequencing of the entire gene.


June 1, 2021  |  

Allele-level sequencing and phasing of full-length HLA class I and II genes using SMRT Sequencing technology

The three classes of genes that comprise the MHC gene family are actively involved in determining donor-recipient compatibility for organ transplant, as well as susceptibility to autoimmune diseases via cross-reacting immunization. Specifically, Class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DQ and -DP are considered medically important for genetic analysis to determine histocompatibility. They are highly polymorphic and have thousands of alleles implicated in disease resistance and susceptibility. The importance of full-length HLA gene sequencing for genotyping, detection of null alleles, and phasing is now widely acknowledged. While DNA-sequencing-based HLA genotyping has become routine, only 7% of the HLA genes have been characterized by allele-level sequencing, while 93% are still defined by partial sequences. The gold-standard Sanger sequencing technology is being quickly replaced by second-generation, high- throughput sequencing methods due to its inability to generate unambiguous phased reads from heterozygous alleles. However, although these short, high-throughput, clonal sequencing methods are better at heterozygous allele detection, they are inadequate at generating full-length haploid gene sequences. Thus, full-length gene sequencing from an enhancer-promoter region to a 3’UTR that includes phasing information without the need for imputation still remains a technological challenge. The best way to overcome these challenges is to sequence these genes with a technology that is clonal in nature and has the longest possible read lengths. We have employed Single Molecule Real-Time (SMRT) sequencing technology from Pacific Biosciences for sequencing full-length HLA class I and II genes.


June 1, 2021  |  

Next generation sequencing of full-length HIV-1 env during primary infection.

Background: The use of next generation sequencing (NGS) to examine circulating HIV env variants has been limited due to env’s length (2.6 kb), extensive indel polymorphism, GC deficiency, and long homopolymeric regions. We developed and standardized protocols for isolation, RT-PCR amplification, single molecule real-time (SMRT) sequencing, and haplotype analysis of circulating HIV-1 env variants to evaluate viral diversity in primary infection. Methodology: HIV RNA was extracted from 7 blood plasma samples (1 mL) collected from 5 subjects (one individual sampled and sequenced at 3 time points) in the San Diego Primary Infection Cohort between 3-33 months from their estimated date of infection (EDI). Median viral load per sample was 50,118 HIV RNA copies/mL (range: 22,387-446,683). Full-length (3.2 kb) env amplicons were constructed into SMRTbell templates without shearing, and sequenced on the PacBio RS II using P4/C2 chemistry and 180 minute movie collection without stage start. To examine viral diversity in each sample, we determined haplotypes by clustering circular consensus sequences (CCS), and reconstructing a cluster consensus sequence using a partial order alignment approach. We measured sample diversity both as the mean pairwise distance among reads, and the fraction of reads containing indel polymorphisms. Results: We collected a median of 8,775 CCS reads per SMRT Cell (range: 4243-12234). A median of 7 haplotypes per subject (range: 1-55) were inferred at baseline. For the one subject with longitudinal samples analyzed, we observed an increasing number of distinct haplotypes (8 to 55 haplotypes over the course of 30 months), and an increasing mean pairwise distance among reads (from 0.8% to 1.6%, Tamura-Nei 93). We also observed significant indel polymorphism, with 16% of reads from one sample later in infection (33 months post-EDI) exhibiting deletions of more than 10% of env with respect to the reference strain, HXB2. Conclusions: This study developed a standardized NGS procedure (PacBio SMRT) to deep sequence full-length HIV RNA env variants from the circulating viral population, achieving good coverage, confirming low env diversity during primary infection that increased over time, and revealing significant indel polymorphism that highlights structural variation as important to env evolution. The long, accurate reads greatly simplified downstream bioinformatics analyses, especially haplotype phasing, increasing our confidence in the results. The sequencing methodology and analysis tools developed here could be successfully applied to any area for which full-length HIV env analysis would be useful.


June 1, 2021  |  

Characterization of NNRTI mutations in HIV-1 RT using Single Molecule, Real-Time SMRT Sequencing.

Background: Genotypic testing of chronic viral infections is an important part of patient therapy and requires assays capable of detecting the entire spectrum of viral mutations. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to other sequencing technologies, including superior resolution of mixed populations and long read lengths capable of spanning entire viral protein coding regions. We examined detection sensitivity of SMRT sequencing using a mixture of HIV-1 RT gene coding regions containing single NNRTI mutations. Methodology: SMRTbell templates were prepared from PCR products generated from a prospective reference material being developed by BC Center of Excellence for HIV/AIDS, and contained a mixture of fifteen infectious viruses containing single NNRTI resistance mutations (viz V90I, K101E, K103N, V108I, E138A/G/K/Q, V179D, Y181C, Y188C, G190A/S, M230L and P236L) built upon the HIV-1LAI molecular clone. Templates were sequenced on the PacBio RS II to obtain single molecule long reads using P4/C2 chemistry, using 180 minute movie collection without stage start. The relative abundances of the mutant viruses were then estimated using codon-aware analysis methods. Results: Sequencing of these templates produced average read lengths of 5.0 KB, comprising 40,000-fold coverage across the entire amplicon per SMRT Cell. All the expected mutations in the mixture of mutant viruses were accurately identified. Frequencies of NNRTI variants estimated ranged from 0.5% to 12.5%. Conclusions: Codon analysis revealed a number of variants across the amplicon with highly consistent results across SMRT Cells. From a single SMRT Cell, variants were accurately and reliably detected down to 0.5% with simple analyses. Long polymerase reads and high accuracy reads make it possible to call variants from just a few molecules. SMRT Sequencing can identify species comprising a mixed viral population, with granularity and low cost of consumables allowing for smaller multiplexing of samples and first-in-first-out processing.


June 1, 2021  |  

A novel analytical pipeline for de novo haplotype phasing and amplicon analysis using SMRT Sequencing technology.

While the identification of individual SNPs has been readily available for some time, the ability to accurately phase SNPs and structural variation across a haplotype has been a challenge. With individual reads of an average length of 9 kb (P5-C3), and individual reads beyond 30 kb in length, SMRT Sequencing technology allows the identification of mutation combinations such as microdeletions, insertions, and substitutions without any predetermined reference sequence. Long- amplicon analysis is a novel protocol that identifies and reports the abundance of differing clusters of sequencing reads within a single library. Graphs generated via hierarchical clustering of individual sequencing reads are used to generate Markov models representing the consensus sequence of individual clusters found to be significantly different. Long-amplicon analysis is capable of differentiating between underlying sequences that are 99.9% similar, which is suitable for haplotyping and differentiating pseudogenes from coding transcripts. This protocol allows for the identification of structural variation in the MUC5AC gene sequence, despite the presence of a gap in the current genome assembly, and can also be used for HLA haplotyping. Clustering can also been applied to identify full length transcripts for the purpose of estimating consensus sequences and enumerating isoform types. Long-amplicon analysis allows for the elucidation of complex regions otherwise missed by other sequencing technologies, which may contribute to the diagnosis and understanding of otherwise complex diseases.


June 1, 2021  |  

Long-read, single-molecule applications for protein engineering.

The long read lengths of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases of sequence. This feature is particularly useful in the context of protein engineering, where large numbers of similar constructs are generated routinely to explore the effects of mutations on function and stability. We have developed a PCR-based barcoded sequencing method to generate high quality, full-length sequence data for batches of constructs generated in a common backbone. Individual barcodes are coupled to primers targeting a common region of the vector of interest. The amplified products are pooled into a single DNA library, and sequencing data are clustered by barcode to generate multi-molecule consensus sequences for each construct present in the pool. As a proof-of-concept dataset, we have generated a library of 384 randomly mutated variants of the Phi29 DNA polymerase, a 575 amino acid protein encoded by a 1.7 kb gene. These variants were amplified with a set of barcoded primers, and the resulting library was sequenced on a single SMRT Cell. The data produced sequences that were completely concordant with independent Sanger sequencing, for a 100% accurate reconstruction of the set of clones.


June 1, 2021  |  

Accurately surveying uncultured microbial species with SMRT Sequencing

Background: Microbial ecology is reshaping our understanding of the natural world by revealing the large phylogenetic and functional diversity of microbial life. However the vast majority of these microorganisms remain poorly understood, as most cultivated representatives belong to just four phylogenetic groups and more than half of all identified phyla remain uncultivated. Characterization of this microbial ‘dark matter’ will thus greatly benefit from new metagenomic methods for in situ analysis. For example, sensitive high throughput methods for the characterization of community composition and structure from the sequencing of conserved marker genes. Methods: Here we utilize Single Molecule Real-Time (SMRT) sequencing of full-length 16S rRNA amplicons to phylogenetically profile microbial communities to below the genus-level. We test this method on a mock community of known composition, as well as a previously studied microbial community from a lake known to predominantly contain poorly characterized phyla. These results are compared to traditional 16S tag sequencing from short-read technologies and subsets of the full-length data corresponding to the same regions of the 16S gene. Results: We explore the benefits of using full-length amplicons for estimating community structure and diversity. In addition, we investigate the possible effects of context-specific and GC-content biases known to affect short-read sequencing technologies on the predicted community structure. We characterize the potential benefits of profiling metagenomic communities with full-length 16S rRNA genes from SMRT sequencing relative to standard methods.


June 1, 2021  |  

Developments in PacBio metagenome sequencing: Shotgun whole genomes and full-length 16S.

The assembly of metagenomes is dramatically improved by the long read lengths of SMRT Sequencing. This is demonstrated in an experimental design to sequence a mock community from the Human Microbiome Project, and assemble the data using the hierarchical genome assembly process (HGAP) at Pacific Biosciences. Results of this analysis are promising, and display much improved contiguity in the assembly of the mock community as compared to publicly available short-read data sets and assemblies. Additionally, the use of base modification information to make further associations between contigs provides additional data to improve assemblies, and to distinguish between members within a microbial community. The epigenetic approach is a novel validation method unique to SMRT Sequencing. In addition to whole-genome shotgun sequencing, SMRT Sequencing also offers improved classification resolution and reliability of metagenomic and microbiome samples by the full-length sequencing of 16S rRNA (~1500 bases long). Microbial communities can be detected at the species level in some cases, rather than being limited to the genus taxonomic classification as constrained by short-read technologies. The performance of SMRT Sequencing for these metagenomic samples achieved >99% predicted concordance to reference sequences in cecum, soil, water, and mock control investigations for bacterial 16S. Community samples are estimated to contain from 2.3 and up to 15 times as many species with abundance levels as low as 0.05% compared to the identification of phyla groups.


June 1, 2021  |  

Genomic DNA sequences of HLA class I alleles generated using multiplexed barcodes and SMRT DNA Sequencing technology.

Allelic-level resolution HLA typing is known to improve survival prognoses post Unrelated Donor (UD) Haematopoietic Stem Cell Transplantation (HSCT). Currently, many commonly used HLA typing methodologies are limited either due to the fact that ambiguity cannot be resolved or that they are not amenable to high-throughput laboratories. Pacific Biosciences’ Single Molecule Real-Time (SMRT) DNA sequencing technology enables sequencing of single molecules in isolation and has read-length capabilities to enable whole gene sequencing for HLA. DNA barcode technology labels samples with unique identifiers that can be traced throughout the sequencing process. The use of DNA barcodes means that multiple samples can be sequenced in a single experiment but data can still be attributed to the correct sample. Here we describe the results of experiments that use DNA barcodes to facilitate sequencing of multiple samples for full-length HLA class I genes (known as multiplexing).


June 1, 2021  |  

HLA sequencing using SMRT Technology – High resolution and high throughput HLA genotyping in a clinical setting

Sequence based typing (SBT) is considered the gold standard method for HLA typing. Current SBT methods are rather laborious and are prone to phase ambiguity problems and genotyping uncertainties. As a result, the NGS community is rapidly seeking to remedy these challenges, to produce high resolution and high throughput HLA sequencing conducive to a clinical setting. Today, second generation NGS technologies are limited in their ability to yield full length HLA sequences required for adequate phasing and identification of novel alleles. Here we present the use of single molecule real time (SMRT) sequencing as a means of determining full length/long HLA sequences. Moreover we reveal the scalability of this method through multiplexing approches and determine HLA genotyping calls through the use of third party Gendx NGSengine® software.


June 1, 2021  |  

Long Amplicon Analysis: Highly accurate, full-length, phased, allele-resolved gene sequences from multiplexed SMRT Sequencing data.

The correct phasing of genetic variations is a key challenge for many applications of DNA sequencing. Allele-level resolution is strongly preferred for histocompatibility sequencing where recombined genes can exhibit different compatibilities than their parents. In other contexts, gene complementation can provide protection if deleterious mutations are found on only one allele of a gene. These problems are especially pronounced in immunological domains given the high levels of genetic diversity and recombination seen in regions like the Major Histocompatibility Complex. A new tool for analyzing Single Molecule, Real-Time (SMRT) Sequencing data – Long Amplicon Analysis (LAA) – can generate highly accurate, phased and full-length consensus sequences for multiple genes in a single sequencing run.


June 1, 2021  |  

Evaluation of multiplexing strategies for HLA genotyping using PacBio Sequencing technology.

Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facilitated by SMRT Sequencing technology. In the present work, we have evaluated multiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell adapters. Eight different 16-bp barcode sequences were used in symmetric & asymmetric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRT Cell. Amplicons generated from barcoded primers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 samples and also allowed generation of unique asymmetric pairings for simultaneous amplification from 28 reference genomic DNA samples. The data generated from all 3 methods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software.


June 1, 2021  |  

Impact of DNA quality on PacBio RS II read lengths.

Maximizing the read length of next generation sequencing (NGS) facilitates de novo genome assembly. Currently, the PacBio RS II system leads the industry with respect to maximum possible NGS read lengths. Amplicon Express specializes in preparation of high molecular weight, NGS-grade genomic DNA for a variety of applications, including next generation sequencing. This study was performed to evaluate the effects of gDNA quality on PacBio RS II read length.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.