Menu
June 1, 2021  |  

New advances in SMRT Sequencing facilitate multiplexing for de novo and structural variant studies

The latest advancements in Sequel II SMRT Sequencing have increased average read lengths up to 50% compared to Sequel II chemistry 1.0 which allows multiplexing of 2-3 small organisms (<500 Mb) such as insects and worms for producing reference quality assemblies, calling structural variants for up to 2 samples with ~3 Gb genomes, analysis of 48 microbial genomes, and up to 8 communities for metagenomic profiling in a single SMRT Cell 8M. With the improved processivity of the new Sequel II sequencing polymerase, more SMRTbell molecules reach rolling circle mode resulting in longer overall read lengths, thus allowing efficient detection of barcodes (up to 80%) in the SMRTbell templates. Multiplexing of genomes larger than microbial organisms is now achievable. In collaboration with the Wellcome Sanger Institute, we have developed a workflow for multiplexing two individual Anopheles coluzzii using as low as 150 ng genomic DNA per individual. The resulting assemblies had high contiguity (contig N50s over 3 Mb) and completeness (>98% of conserved genes) for both individuals. For microbial multiplexing, we multiplexed 48 microbes with varying complexities and sizes ranging 1.6-8.0 Mb in single SMRT Cell 8M. Using a new end-to-end analysis (Microbial Assembly Analysis, SMRT Link 8.0), assemblies resulted in complete circularized genomes (>200-fold coverage) and efficient detection of >3-200 kb plasmids. Finally, the long read lengths (>90 kb) allows detection of barcodes in large insert SMRTbell templates (>15 kb) thus facilitating multiplex of two human samples in 1 SMRT Cell 8M for detecting SVs, Indels and CNVs. Here, we present results and describe workflows for multiplexing samples for specific applications for SMRT Sequencing.


June 1, 2021  |  

Copy-number variant detection with PacBio long reads

Long-read sequencing of diverse humans has revealed more than 20,000 insertion, deletion, and inversion structural variants spanning more than 12 Mb in a healthy human genome. Most of these variants are too large to detect with short reads and too small for array comparative genome hybridization (aCGH). While the standard approaches to calling structural variants with long reads thrive in the 50 bp to 10 kb size range, they tend to miss exactly the large (>50 kb) copy-number variants that are called more readily with aCGH. Standard algorithms rely on reference-based mapping of reads that fully span a variant or on de novo assembly; and copy-number variants are often too large to be spanned by a single read and frequently involve segmentally duplicated sequence that is not yet included in most de novo assemblies. To comprehensively detect large variants in human genomes, we extended pbsv – a structural variant caller for long reads – to call copy-number variants (CNVs) from read-clipping and read-depth signatures. In human germline benchmark samples, we detect more than 300 CNVs spanning around 10 Mb, and we call hundreds of additional events in re-arranged cancer samples. Together with insertion, deletion, inversion, duplication, and translocation calling from spanning reads, this allows pbsv to comprehensively detect large variants from a single data type.


June 1, 2021  |  

Comprehensive variant detection in a human genome with highly accurate long reads

Introduction: Long-read sequencing has revealed more than 20,000 structural variants spanning over 12 Mb in a healthy human genome. Short-read sequencing fails to detect most structural variants but has remained the more effective approach for small variants, due to 10-15% error rates in long reads, and copy-number variants (CNVs), due to lack of effective long-read variant callers. The development of PacBio highly accurate long reads (HiFi reads) with read lengths of 10-25 kb and quality >99% presents the opportunity to capture all classes of variation with one approach.Methods: We sequence the Genome in a Bottle benchmark sample HG002 and an individual with a presumed Mendelian disease with HiFi reads. We call SNVs and indels with DeepVariant and extend the structural variant caller pbsv to call CNVs using read depth and clipping signatures. Results: For 18-fold coverage with 13 kb HiFi reads, variant calling in HG002 achieves an F1 score of 99.7% for SNVs, 96.6% for indels, and 96.4% for structural variants. Additionally, we detect more than 300 CNVs spanning around 10 Mb. For the Mendelian disease case, HiFi reads reveal thousands of variants that were overlooked by short-read sequencing, including a candidate causative structural variant. Conclusions: These results illustrate the ability of HiFi reads to comprehensively detect variants, including those associated with human disease.


June 1, 2021  |  

A workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads

PacBio HiFi reads (minimum 99% accuracy, 15-25 kb read length) have emerged as a powerful data type for comprehensive variant detection in human genomes. The HiFi read length extends confident mapping and variant calling to repetitive regions of the genome that are not accessible with short reads. Read length also improves detection of structural variants (SVs), with recall exceeding that of short reads by over 30%. High read quality allows for accurate single nucleotide variant and small indel detection, with precision and recall matching that of short reads. While many tools have been developed to take advantage of these qualities of HiFi reads, there is no end-to-end workflow for the filtering and prioritization of variants uniquely detected with long reads for rare and undiagnosed disease research. We have developed a flexible, modular workflow and web portal for variant analysis from HiFi reads and applied it to a set of rare disease cases unsolved by short-read whole genome sequencing. We expect that broad application of long-read variant detection workflows will solve many more rare disease cases. We have made these tools available at https://github.com/williamrowell/pbRUGD-workflow, and we hope they serve a starting point for developing a robust analysis framework for long read variant detection for rare diseases.


June 1, 2021  |  

Targeting Clinically Significant Dark Regions of the Human Genome with High-Accuracy, Long-Read Sequencing

Introduction: There are many clinically important genes in “dark” regions of the human genome. These regions are characterized as dark due to a paucity of NGS coverage as a result of short-read sequencing or mapping difficulties. Low NGS sequencing yield can arise in these regions due to the presence of various repeat elements or biased base composition while inaccurate mapping is attributable to segmental duplications. Long-read sequencing coupled with an optimized, robust enrichment method has the potential to illuminate these dark regions.

Materials and Methods: Using PacBio highly accurate long-read (HiFi) Sequencing, coupled with a long-PCR targeted enrichment method, we investigated two important dark region genes that are challenging to accurately type with short-read sequencing due to associated pseudogenes: CYP21A2, responsible for congenital adrenal hyperplasia, and GBA, responsible for Gaucher disease. For each gene, our aim was to cover regions of pathogenic mutations in a single contiguous sequence or set of sequences that can be assayed in a single reaction. CYP21A2 and an associated pseudogene CYP21A1P were co-amplified in a single long-range PCR reaction generating a 10.2 kb and 8.9 kb amplicon, respectively. Similarly, GBA and an associated pseudogene GBAP1 were co-amplified in a single long-range PCR reaction generating a 12.6 kb and 16.0 kb amplicon, respectively. Seven Coriell samples for the CYP21A2 target region and 13 Coriell samples for the GBA target region containing known pathogenic mutations were studied in replicate. SMRTbell libraries were generated from pooled amplicons for each target gene and sequenced on a PacBio Sequel II System. Accounting for replicates, each library contained a multiplex of 24 samples. A new PacBio sequence clustering algorithm, pbAA, designed for rapid analysis of HiFi reads from amplicons was used in variant typing.

Results: All pathogenic CYP21A2 and GBA variants were accurately called in the test samples. These variants included whole-gene deletions, gene duplication, gene fusions, and recombinant exons. Additionally, phasing of complex heterozygotes was achieved.

Conclusion: We demonstrate that long-read HiFi Sequencing provides new opportunities for sequencing clinically relevant but previously dark regions of the human genome that are underrepresented in short-read sequencing. Accurate long reads provide important phasing information, identify structural variations, and avoid potential confusion with pseudogenes. SMRT Sequencing of these regions enables a better understanding of the relationship between genetic factors and personal health and has the potential to ultimately help guide health-related decisions.


June 1, 2021  |  

Amplification-free targeted enrichment powered by CRISPR-Cas9 and long-read Single Molecule Real-Time (SMRT) Sequencing can efficiently and accurately sequence challenging repeat expansion disorders

Genomic regions with extreme base composition bias and repetitive sequences have long proven challenging for targeted enrichment methods, as they rely upon some form of amplification. Similarly, most DNA sequencing technologies struggle to faithfully sequence regions of low complexity. This has been especially trying for repeat expansion disorders such as Fragile-X disease, Huntington disease and various Ataxias, where the repetitive elements range from several hundreds of bases to tens of kilobases. We have developed a robust, amplification-free targeted enrichment technique, called No-Amp Targeted Sequencing, that employs the CRISPR-Cas9 system. In conjunction with SMRT Sequencing, which delivers long reads spanning the entire repeat expansion, high consensus accuracy, and uniform coverage, these previously inaccessible regions are now accessible. This method is completely amplification-free, therefore removing any PCR errors and biases from the experiment. Furthermore, this technique also preserves native DNA molecules, allowing for direct detection and characterization of epigenetic signatures. The No-Amp method is a two-day protocol that is compatible with multiplexing of multiple targets and multiple samples in a single reaction, using as little as 1 µg of genomic DNA input per sample. We have successfully targeted a number of repeat expansion disorder loci including HTT, FMR1, C9orf7,2 as well as built an Ataxia panel which consists of 15 different disease-causing repeat expansion regions. Using the No-Amp method we have isolated hundreds of individual on-target molecules, allowing for reliable repeat size estimation, mosaicism detection and identification of interruption sequences with alleles as long as >2700 repeat unites ( >13 kb). In addition to multiplexing several targets, we have also multiplexed at least 20 samples in one experiment making the No-Amp Targeted Sequencing method a cost-effective option. Combining the CRISPR-Cas9 enrichment method with Single Molecule, Real-Time Sequencing provided us with base-level resolution of previously inaccessible regions of the genome, like disease-causing repeat expansions. No-Amp Targeted Sequencing captures, in one experiment, many aspects of repeat expansion disorders which are important for better understanding the underlying disease mechanisms.


June 1, 2021  |  

Full-Length Sequencing of CYP2D6Variants with PacBio HiFi Reads

CYP2D6 is a highly polymorphic gene with more than 130 named variants, including deletions, duplications, single nucleotide polymorphisms, and other types of variation (Butler, 2018; Black et al., 2011). These variants affect the rate of metabolism in human individuals of approximately 25% of common prescription drugs (Owen et al., 2019;). PacBio SMRT sequencing is a proven tool for the interrogation of CYP2D6 variants (Qiao et al., 2016; Buermans et al., 2017).  Now with HiFi sequencing, we have developed a streamlined end-to-end workflow for the more accurate detection of highly polymorphic CYP2D6 loci. This study also evaluates the advantage of HiFi reads for the sequencing of full-length CYP2D6 genes with variants previously annotated by other technologies.

Twenty-two Coriell pharmacogenomic samples containing variant CYP2D6 alleles were amplified using long-range PCR. The primer pairs for the amplification of upstream CYP2D6 gene duplications and the downstream CYP2D6 genes were adapted from a publication in Pharmacogenomics (Qiao et al., 2019). A 2-step PCR method was used for the addition of the unique barcode to each sample, allowing pooling of multiple samples for SMRTbell library prep. The resulting SMRTbell Library was then sequenced on the PacBio Sequel II/IIe system for 20-hours. HiFi reads (>QV20) were demultiplexed on SMRTlink and clustered into haplotypes. The consensus reads of each haplotype were produced using the “pbaa” amplicon analysis and then mapped to the human reference genome GRCh38 for the assignment of CYP2D6 types.

More than 700,000 full-length HiFi reads were generated with an average read length of 8.2 kb and a mean accuracy of 99.9%. Nearly all (>99%) demultiplexed reads were on target to the CYP2D6 locus. Genotyping of the CYP2D6 region with PacBio HiFi reads identified all expected upstream duplications and downstream CYP2D6 alleles including single nucleotide variants, except for *5 allele which is a complete deletion. For 21 of 22 samples, the types from HiFi reads matched the diplotypes identified from microarrays and qPCR, while providing full resolution of each allele. One sample was identified as being mistyped by microarray as *1/*41. HiFi sequencing produced a correct type of *33/*41. In addition, for 4/21 samples HiFi sequencing identified duplications missed by microarray or real-time PCR.

The PCR and sequencing assay we have presented here for the detection of CYP2D6 variants is robust and specific. Assignment of new alleles or duplications on pharmacogenomic samples from HiFi reads suggests that PacBio sequencing technology can reveal new diplotypes that were not characterized accurately by other technologies. This study demonstrates that HiFi sequencing provides much higher resolution than either microarray or real-time PCR for the detection of polymorphic genes, while maintaining sensitivity and accuracy.


June 1, 2021  |  

Resolving Complex Pathogenic Alleles using HiFi Long-range Amplicon Data and a New Clustering Algorithm

Many genetic diseases are mapped to structurally complex loci. These regions contain highly similar paralogous alleles (>99% identity) that span kilobases within the human genome. Comprehensive screening for pathogenic variants amongst paralogous sequences is incomplete and labor intensive using short-reads or optical mapping. In contrast, long-range targeted amplification and PacBio HiFi sequencing fully and directly resolves and phases a wide range of pathogenic variants without assembly or inference. To capitalize on the accuracy of HiFi amplicon data we designed a new amplicon analysis tool, pbAA. pbAA uses a new sequence clustering algorithm to rapidly deconvolve (separate) a mixture of haplotypes, enabling precise diplotyping, and disease allele classification. In this experiment, we analyzed two sets of gene-pseudogene systems, GBA and CYP, that are the second and eighth most common carrier disease alleles, respectively. Samples tested were selected from the Coriell catalog known to have pathogenic variants troublesome to test for with standard short-read assays. Co-amplified long-range PCR amplicons were generated for GBA (12kb)/GBAP1 (15kb), responsible for Gaucher disease, as well as CYP21A2 (10kb)/CYP21A1P (8kb), responsible for congenital adrenal hyperplasia. We obtained 7 samples to test the CYP21A2 region and 13 separate samples for GBA.  HiFi reads were then generated from the amplicon libraries on both Sequel and Sequel II Systems, with replicated samples, to achieve a 24-sample multiplex for each target. Consensus amplicons were produced using pbAA and variants were determined using minimap2 alignments along with a custom SQL database for characterizing and reporting results.  From these data we were able to accurately call all pathogenic variants in the test samples for all replicates, including whole-gene deletions, gene duplication, gene fusions, recombinant exons, and phased complex heterozygotes.  In one trio affected by adrenal hyperplasia, three large structural variants were correctly and independently attributed to the parents and proband, including a duplication of CP21A1P and a CYP21A1P-CYP21A2 gene fusion in the mother and a CYP21A2 deletion in the father. This experiment demonstrates how PacBio HiFi data, analyzed with pbAA, simplifies targeted disease allele identification.  


May 17, 2021  |  

Genomic Answers for Kids

Short-read genome-wide sequencing for molecular diagnosis has revolutionized pediatric rare disease care in the past decade. However, most families remain without specific knowledge of the cause of their child’s illness….


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.