June 1, 2021  |  

Single Molecule Real Time (SMRT) sequencing sensitively detects polyclonal and compound BCR-ABL in patients who relapse on kinase inhibitor therapy.

Secondary kinase domain (KD) mutations are the most well-recognized mechanism of resistance to tyrosine kinase inhibitors (TKIs) in chronic myeloid leukemia (CML) and other cancers. In some cases, multiple drug resistant KD mutations can coexist in an individual patient (“polyclonality”). Alternatively, more than one mutation can occur in tandem on a single allele (“compound mutations”) following response and relapse to sequentially administered TKI therapy. Distinguishing between these two scenarios can inform the clinical choice of subsequent TKI treatment. There is currently no clinically adaptable methodology that offers the ability to distinguish polyclonal from compound mutations. Due to the size of the BCR-ABL KD where TKI-resistant mutations are detected, next-generation platforms are unable to generate reads of sufficient length to determine if two mutations separated by 500 nucleotides reside on the same allele. Pacific Biosciences RS Single Molecule Real-Time (SMRT) circular consensus sequencing technology is a novel third generation deep sequencing technology capable of rapidly and reliably achieving average read lengths of ~1000 bp and frequently beyond 3000 bp, allowing sequencing of the entire ABL KD on single strand of DNA. We sought to address the ability of SMRT sequencing technology to distinguish polyclonal from compound mutations using clinical samples obtained from patients who have relapsed on BCR-ABL TKI treatment.


June 1, 2021  |  

SMRT Sequencing of whole mitochondrial genomes and its utility in association studies of metabolic disease.

In this study we demonstrate the utility of Single-Molecule Real Time SMRT sequencing to detect variants and to recapitulate whole mitochondrial genomes in an association study of Metabolic syndrome using samples from a well-studied cohort from Micronesia. The Micronesian island of Kosrae is a rare genetic isolate that offers significant advantages for genetic studies of human disease. Kosrae suffers from one of the highest rates of MetS (41%), obesity (52%), and diabetes (17%) globally and has a homogeneous environment making this an excellent population in which to study these significant health problems. We are conducting family-based association analyses aimed at identifying specific mitochondrial variants that contribute to obesity and other co-morbid conditions. We sequenced whole mitochondrial genomes from 10 Kosraen individuals who represent greater than 25 % of the mitochondrial genetic diversity for the entire Kosraen population. Using Pacific Biosciences C2 chemistry, SMRTbell libraries were constructed from pooled, full-length, unsheared 5 kb PCR amplicons, tiling the entire 16.6 kb mtDNA genome. Average read lengths for each sample were between 2500-3000 bp, with 5% of reads between 6,000-8,000 bases, depending on movie lengths. The data generated in this study serve as proof of principle that SMRT Sequencing data can be utilized for identification of high-quality variants and complete mitochondrial genome sequences. These data will be leveraged to identify causative variants for Metabolic syndrome and associated disorders.


June 1, 2021  |  

Harnessing kinetic information in Single-Molecule, Real-Time Sequencing.

Single-Molecule Real-Time (SMRT) DNA sequencing is unique in that nucleotide incorporation events are monitored in real time, leading to a wealth of kinetic information in addition to the extraction of the primary DNA sequence. The dynamics of the DNA polymerase that is observed adds an additional dimension of sequence-dependent information, and can be used to learn more about the molecule under study. First, the primary sequence itself can be determined more accurately. The kinetic data can be used to corroborate or overturn consensus calls and even enable calling bases in problematic sequence contexts. Second, using the kinetic information, we can detect and discriminate numerous chemical base modifications as a by-product of ordinary sequencing. Examples of applying these capabilities include (i) the characterization of the epigenome of microorganisms by directly sequencing the three common prokaryotic epigenetic base modifications of 4-methylcytosine, 5- methylcytosine and 6-methyladenine; (ii) the characterization of known and novel methyltransferase activities; (iii) the direct sequencing and differentiation of the four eukaryotic epigenetic forms of cytosine (5-methyl, 5-hydroxymethyl, 5-formyl, and 5-carboxylcytosine) with first applications to map them with single base-pair and DNA strand resolution across mammalian genomes; (iv) the direct sequencing and identification of numerous modified DNA bases arising from DNA damage; and (v) an exploration of the mitochondrial genome for known and novel base modifications. We will show our progress towards a generic, open-source algorithm for exploiting kinetic information for any of these purposes.


June 1, 2021  |  

Sequencing and de novo assembly of the 17q21.31 disease associated region using long reads generated by Pacific Biosciences SMRT Sequencing technology.

Assessment of genome-wide variation revealed regions of the genome with complex, structurally diverse haplotypes that are insufficiently represented in the human reference genome. The 17q21.31 region is one of the most dynamic and complex regions of the human genome. Different haplotypes exist, in direct and inverted orientation, showing evidence of positive selection and predisposing to microdeletion associated with mental retardation. Sequencing of different haplotypes is extremely important to characterize the spectrum of structural variation at this locus. However, de novo assembly with second-generation sequencing reads is still problematic. Using PacBio technology we have sequenced and de novo assembled a tiling path of eight BAC clones (~1.6 Mb region) across this medically relevant region from the library of a hydatidiform mole. Complete hydatidiform moles arise from the fertilization of an enucleated egg from a single sperm and therefore carry a haploid complement of the human genome, eliminating allelic variation that may confound mapping and assembly. The PacBio RS system enables single molecule real time sequencing, featuring long reads and fast turnaround times. With deep sequencing, PacBio reads were able to generate a very uniform sequencing coverage with close to 100% coverage of most of the target interval regions covered. Due to long read lengths, the PacBio RS data could be accurately assembled.


June 1, 2021  |  

Direct sequencing and identification of damaged DNA bases.

DNA is under constant stress from both endogenous and exogenous sources. DNA base modifications resulting from various types of DNA damage are wide-spread and play important roles in affecting physiological states and disease phenotypes. Examples include oxidative damage (8- oxoguanine, 8-oxoadenine; aging, Alzheimer’s, Parkinson’s), alkylation (1-methyladenine, 6-O- methylguanine; cancer), adduct formation (benzo[a]pyrene diol epoxide (BPDE), pyrimidine dimers; smoking, industrial chemical exposure, chemical UV light exposure, cancer), and ionizing radiation damage (5-hydroxycytosine, 5- hydroxyuracil, 5-hydroxymethyluracil; cancer). Currently, these and other products of DNA damage cannot be sequenced with existing sequencing methods. In contrast, single molecule, real-time (SMRT) DNA sequencing can report on modified DNA bases through an analysis of the DNA polymerase kinetics that is affected by a modified base in the template. We demonstrate the DNA strand-resolved sequencing of over 8 different DNA-damage associated base modifications, with base pair resolution and single DNA molecule sensitivity. We also report on the application of this sequencing capability to biological samples and the development of a generic, open-source algorithm to analyze kinetic information from SMRT sequencing.


June 1, 2021  |  

Complete HIV-1 genomes from single molecules: Diversity estimates in two linked transmission pairs using clustering and mutual information.

We sequenced complete HIV-1 genomes from single molecules using Single Molecule, Real- Time (SMRT) Sequencing and derive de novo full-length genome sequences. SMRT sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. We detail mathematical techniques used in viral variant subspecies identification including clustering distance metrics and mutual information. Sequencing was performed in order to better understand the relationships between the specific sequences of transmitted viruses in linked transmission pairs. Samples representing HIV transmission pairs were selected from the Zambia Emory HIV Research Project (Lusaka, Zambia) and sequenced. We examine Single Genome Amplification (SGA) prepped samples and samples containing complex mixtures of genomes. Whole genome consensus estimates for each of the samples were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. SMRT Sequencing data contained multiple full-length (greater than 9 kb) continuous reads for each sample. Simple whole genome consensus estimates easily identified transmission pairs. The clustering of the genome reads showed diversity differences between the samples, allowing us to characterize the diversity of the individual quasi-species comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

Single Molecule Real-Time (SMRT) Sequencing of genes implicated in autosomal recessive diseases.

In today’s clinical diagnostic laboratories, the detection of the disease causing mutations is either done through genotyping or Sanger sequencing. Whether done singly or in a multiplex assay, genotyping works only if the exact molecular change is known. Sanger sequencing is the gold standard method that captures both known and novel molecular changes in the disease gene of interest. Most clinical Sanger sequencing assays involve PCR-amplifying the coding sequences of the disease target gene followed by bi-directional sequencing of the amplified products. Therefore for every patient sample, one generates multiple amplicons singly and each amplicon leads to two separate sequencing reactions. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to Sanger sequencing including long read lengths, first-in-first-out processing, fast time to result, high-levels of multiplexing and substantially reduced costs. For our first proof-of-concept experiment, we queried 3 known disease-associated mutations in de-identified clinical samples. We started off with 3 autosomal recessive diseases found at an increased frequency in the Ashkenazi Jewish population: Tay Sachs disease, Niemann-Pick disease and Canavan disease. The mutated gene in Tays Sachs is HEXA, Niemann-Pick is SMPD1 and Canavan is ASPA. Coding exons were amplified in multiple (6-13) amplicons for each gene from both non-carrier and carriers. Amplicons were purified, concentrations normalized, and combined prior to SMRTbell™ Library prep. A single SMRTbell library was sequenced for each gene from each patient using standard Pacific Biosciences C2 chemistry and protocols. Average read lengths of 4,000 bp across samples allowed for high-quality Circular Consensus Sequences (CCS) across all amplicons (all less than 1 kb). This high quality CCS data permitted the clean partitioning of reads from a patient in the presence of heterozygous events. Using non-carrier sequencing as a control, we were able to correctly identify the known events in carrier genes. This suggests the potential utility of SMRT sequencing in a clinical setting, enabling a cost-effective method of replacing targeted mutation detection with sequencing of the entire gene.


June 1, 2021  |  

Advances in sequence consensus and clustering algorithms for effective de novo assembly and haplotyping applications.

One of the major applications of DNA sequencing technology is to bring together information that is distant in sequence space so that understanding genome structure and function becomes easier on a large scale. The Single Molecule Real Time (SMRT) Sequencing platform provides direct sequencing data that can span several thousand bases to tens of thousands of bases in a high-throughput fashion. In contrast to solving genomic puzzles by patching together smaller piece of information, long sequence reads can decrease potential computation complexity by reducing combinatorial factors significantly. We demonstrate algorithmic approaches to construct accurate consensus when the differences between reads are dominated by insertions and deletions. High-performance implementations of such algorithms allow more efficient de novo assembly with a pre-assembly step that generates highly accurate, consensus-based reads which can be used as input for existing genome assemblers. In contrast to recent hybrid assembly approach, only a single ~10 kb or longer SMRTbell library is necessary for the hierarchical genome assembly process (HGAP). Meanwhile, with a sensitive read-clustering algorithm with the consensus algorithms, one is able to discern haplotypes that differ by less than 1% different from each other over a large region. One of the related applications is to generate accurate haplotype sequences for HLA loci. Long sequence reads that can cover the whole 3 kb to 4 kb diploid genomic regions will simplify the haplotyping process. These algorithms can also be applied to resolve individual populations within mixed pools of DNA molecules that are similar to each, e.g., by sequencing viral quasi-species samples.


June 1, 2021  |  

Automated, non-hybrid de novo genome assemblies and epigenomes of bacterial pathogens.

Understanding the genetic basis of infectious diseases is critical to enacting effective treatments, and several large-scale sequencing initiatives are underway to collect this information. Sequencing bacterial samples is typically performed by mapping sequence reads against genomes of known reference strains. While such resequencing informs on the spectrum of single-nucleotide differences relative to the chosen reference, it can miss numerous other forms of variation known to influence pathogenicity: structural variations (duplications, inversions), acquisition of mobile elements (phages, plasmids), homonucleotide length variation causing phase variation, and epigenetic marks (methylation, phosphorothioation) that influence gene expression to switch bacteria from non- pathogenic to pathogenic states. Therefore, sequencing methods which provide complete, de novo genome assemblies and epigenomes are necessary to fully characterize infectious disease agents in an unbiased, hypothesis-free manner. Hybrid assembly methods have been described that combine long sequence reads from SMRT DNA Sequencing with short reads (SMRT CCS (circular consensus) or second-generation reads), wherein the short reads are used to error-correct the long reads which are then used for assembly. We have developed a new paradigm for microbial de novo assemblies in which SMRT sequencing reads from a single long insert library are used exclusively to close the genome through a hierarchical genome assembly process, thereby obviating the need for a second sample preparation, sequencing run, and data set. We have applied this method to achieve closed de novo genomes with accuracies exceeding QV50 (>99.999%) for numerous disease outbreak samples, including E. coli, Salmonella, Campylobacter, Listeria, Neisseria, and H. pylori. The kinetic information from the same SMRT Sequencing reads is utilized to determine epigenomes. Approximately 70% of all methyltransferase specificities we have determined to date represent previously unknown bacterial epigenetic signatures. With relatively short sequencing run times and automated analysis pipelines, it is possible to go from an unknown DNA sample to its complete de novo genome and epigenome in about a day.


June 1, 2021  |  

Single Molecule, Real-Time Sequencing for base modification detection in eukaryotic organisms: Coprinopsis cinerea.

Single Molecule Real-Time (SMRT) DNA sequencing provides a wealth of kinetic information beyond the extraction of the primary DNA sequence, and this kinetic information can provide for the direct detection of modified bases present in genomic DNA. This method has been demonstrated for base modification detection in prokaryotes at base and strand resolutions. In eukaryotes, the common base modifications known to exist are the cytosine variants including methyl, hydroxymethyl, formyl and carboxyl forms. Each of these modifications exhibits different signatures in SMRT kinetic data, allowing for unprecedented possibilities to differentiate between them in direct sequencing data. We present early results of directly sequencing different base modifications in eukaryotic genomic DNA using this method.


June 1, 2021  |  

Automated, non-hybrid de novo genome assemblies and epigenomes of bacterial pathogens

Understanding the genetic basis of infectious diseases is critical to enacting effective treatments, and several large-scale sequencing initiatives are underway to collect this information. Sequencing bacterial samples is typically performed by mapping sequence reads against genomes of known reference strains. While such resequencing informs on the spectrum of single nucleotide differences relative to the chosen reference, it can miss numerous other forms of variation known to influence pathogenicity: structural variations (duplications, inversions), acquisition of mobile elements (phages, plasmids), homonucleotide length variation causing phase variation, and epigenetic marks (methylation, phosphorothioation) that influence gene expression to switch bacteria from non-pathogenic to pathogenic states. Therefore, sequencing methods which provide complete, de novo genome assemblies and epigenomes are necessary to fully characterize infectious disease agents in an unbiased, hypothesis-free manner. Hybrid assembly methods have been described that combine long sequence reads from SMRT DNA sequencing with short, high-accuracy reads (SMRT (circular consensus sequencing) CCS or second-generation reads) to generate long, highly accurate reads that are then used for assembly. We have developed a new paradigm for microbial de novo assemblies in which long SMRT sequencing reads (average readlengths >5,000 bases) are used exclusively to close the genome through a hierarchical genome assembly process, thereby obviating the need for a second sample preparation, sequencing run and data set. We have applied this method to achieve closed de novo genomes with accuracies exceeding QV50 (>99.999%) to numerous disease outbreak samples, including E. coli, Salmonella, Campylobacter, Listeria, Neisseria, and H. pylori. The kinetic information from the same SMRT sequencing reads is utilized to determine epigenomes. Approximately 70% of all methyltransferase specificities we have determined to date represent previously unknown bacterial epigenetic signatures. The process has been automated and requires less than 1 day from an unknown DNA sample to its complete de novo genome and epigenome.


June 1, 2021  |  

Rapid sequencing of HIV-1 genomes as single molecules from simple and complex samples.

Background: To better understand the relationships among HIV-1 viruses in linked transmission pairs, we sequenced several samples representing HIV transmission pairs from the Zambia Emory HIV Research Project (Lusaka, Zambia) using Single Molecule, Real-Time (SMRT) Sequencing. Methods: Single molecules were sequenced as full-length (9.6 kb) amplicons directly from PCR products without shearing. This resulted in multiple, fully-phased, complete HIV-1 genomes for each patient. We examined Single Genome Amplification (SGA) prepped samples, as well as samples containing complex mixtures of genomes. We detail mathematical techniques used in viral variant subspecies identification, including clustering distance metrics and mutual information, which were used to derive multiple de novo full-length genome sequences for each patient. Whole genome consensus estimates for each sample were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV-1 genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. Results: SMRT Sequencing data contained multiple full-length (>9 kb) continuous reads for each sample. Simple whole-genome consensus estimates easily identified transmission pairs. Clustering of genome reads showed diversity differences between samples, allowing characterization of the quasi-species diversity comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

Allele-level sequencing and phasing of full-length HLA class I and II genes using SMRT Sequencing technology

The three classes of genes that comprise the MHC gene family are actively involved in determining donor-recipient compatibility for organ transplant, as well as susceptibility to autoimmune diseases via cross-reacting immunization. Specifically, Class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DQ and -DP are considered medically important for genetic analysis to determine histocompatibility. They are highly polymorphic and have thousands of alleles implicated in disease resistance and susceptibility. The importance of full-length HLA gene sequencing for genotyping, detection of null alleles, and phasing is now widely acknowledged. While DNA-sequencing-based HLA genotyping has become routine, only 7% of the HLA genes have been characterized by allele-level sequencing, while 93% are still defined by partial sequences. The gold-standard Sanger sequencing technology is being quickly replaced by second-generation, high- throughput sequencing methods due to its inability to generate unambiguous phased reads from heterozygous alleles. However, although these short, high-throughput, clonal sequencing methods are better at heterozygous allele detection, they are inadequate at generating full-length haploid gene sequences. Thus, full-length gene sequencing from an enhancer-promoter region to a 3’UTR that includes phasing information without the need for imputation still remains a technological challenge. The best way to overcome these challenges is to sequence these genes with a technology that is clonal in nature and has the longest possible read lengths. We have employed Single Molecule Real-Time (SMRT) sequencing technology from Pacific Biosciences for sequencing full-length HLA class I and II genes.


June 1, 2021  |  

Isoform sequencing: Unveiling the complex landscape of the eukaryotic transcriptome on the PacBio RS II.

Alternative splicing of RNA is an important mechanism that increases protein diversity and is pervasive in the most complex biological functions. While advances in RNA sequencing methods have accelerated our understanding of the transcriptome, isoform discovery remains computationally challenging due to short read lengths. Here, we describe the Isoform Sequencing (Iso-Seq) method using long reads generated by the PacBio RS II. We sequenced rat heart and lung RNA using the Clontech® SMARTer® cDNA preparation kit followed by size selection using agarose gel. Additionally, we tested the BluePippin™ device from Sage Science for efficiently extracting longer transcripts = 3 kb. Post-sequencing, we developed a novel isoform-level clustering algorithm to generate high-quality transcript consensus sequences. We show that our method recovered alternative splice forms as well as alternative stop sites, antisense transcription, and retained introns. To conclude, the Iso-Seq method provides a new opportunity for researchers to study the complex eukaryotic transcriptome even in the absence of reference genomes or annotated transcripts.


June 1, 2021  |  

Getting the most out of your PacBio libraries with size selection.

PacBio RS II sequencing chemistries provide read lengths beyond 20 kb with high consensus accuracy. The long read lengths of P4-C2 chemistry and demonstrated consensus accuracy of 99.999% are ideal for applications such as de novo assembly, targeted sequencing and isoform sequencing. The recently launched P5-C3 chemistry generates even longer reads with N50 often >10,000 bp, making it the best choice for scaffolding and spanning structural rearrangements. With these chemistry advances, PacBio’s read length performance is now primarily determined by the SMRTbell library itself. Size selection of a high-quality, sheared 20 kb library using the BluePippin™ System has been demonstrated to increase the N50 read length by as much as 5 kb with C3 chemistry. BluePippin size selection or a more stringent AMPure® PB selection cutoff can be used to recover long fragments from degraded genomic material. The selection of chemistries, P4-C2 versus P5-C3, is highly dependent on the final size distribution of the SMRTbell library and experimental goals. PacBio’s long read lengths also allow for the sequencing of full-length cDNA libraries at single-molecule resolution. However, longer transcripts are difficult to detect due to lower abundance, amplification bias, and preferential loading of smaller SMRTbell constructs. Without size selection, most sequenced transcripts are 1-1.5 kb. Size selection dramatically increases the number of transcripts >1.5 kb, and is essential for >3 kb transcripts.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.