The highly polymorphic CYP2D6 gene impacts the metabolism of 25% of the mostly prescribed drugs. Thus, accurate identification of variant CYP2D6 alleles in individuals is necessary for personalized medicine. PacBio HiFi sequencing produces long and accurate reads to identify variant regions. Here, we describe an end-to-end workflow for the characterization of full-length CYP2D6 by HiFi sequencing.
Targeting Clinically Significant Dark Regions of the Human Genome with High-Accuracy, Long-Read Sequencing
There are many clinically important genes in “dark” regions of the human genome. These regions are characterized as dark due to a paucity of NGS coverage as a result of short-read sequencing or mapping difficulties. Low NGS sequencing yield can arise in these regions due to the presence of various repeat elements or biased base composition while inaccurate mapping can result from segmental duplications. Long-read sequencing coupled with an optimized, robust enrichment method has the potential to illuminate these dark regions.
Resolving Complex Pathogenic Alleles using HiFi Long-Range Amplicon Data and a New Clustering Algorithm
Many genetic diseases are mapped to structurally complex loci. These regions contain highly similar paralogous alleles (>99% identity) that span kilobases within the human genome. Comprehensive screening for pathogenic variants is incomplete and labor intensive using short-reads or optical mapping. In contrast, long-range amplification and PacBio HiFi sequencing fully and directly resolve and phase a wide range of pathogenic variants without inference. To capitalize on the accuracy of HiFi data we designed a new amplicon analysis tool, pbAA. pbAA can rapidly deconvolve a mixture of haplotypes, enabling precise diplotyping, and disease allele classification.
The killer immunoglobulin-like receptors (KIR) genes belong to the immunoglobulin superfamily and are widely studied due to the critical role they play in coordinating the innate immune response to infection and disease. Highly accurate, contiguous, long reads, like those generated by SMRT Sequencing, when combined with target-enrichment protocols, provide a straightforward strategy for generating complete de novo assembled KIR haplotypes. We have explored two different methods to capture the KIR region; one applying the use of fosmid clones and one using Nimblegen capture.
The increased sequencing throughput creates a need for multiplexing for several applications. We are here detailing different barcoding strategies for microbial sequencing, targeted sequencing, Iso-Seq full-length isoform sequencing, and Roche NimbleGen’s target enrichment method.
Highly contiguous de novo human genome assembly and long-range haplotype phasing using SMRT Sequencing
The long reads, random error, and unbiased sampling of SMRT Sequencing enables high quality, de novo assembly of the human genome. PacBio long reads are capable of resolving genomic variations at all size scales, including SNPs, insertions, deletions, inversions, translocations, and repeat expansions, all of which are both important in understanding the genetic basis for human disease, and difficult to access via other technologies. In demonstration of this, we report a new high-quality, diploid-aware de novo assembly of Craig Venter’s well-studied genome.
Though a role for structural variants in human disease has long been recognized, it has remained difficult to identify intermediate-sized variants (50 bp to 5 kb), which are too small to detect with array comparative genomic hybridization, but too large to reliably discover with short-read DNA sequencing. Recent studies have demonstrated that PacBio Single Molecule, Real-Time (SMRT) sequencing fills this technology gap. SMRT sequencing detects tens of thousands of structural variants in the human genome, approximately five times the sensitivity of short-read DNA sequencing.
Targeted sequencing with Sanger as well as short read based high throughput sequencing methods is standard practice in clinical genetic testing. However, many applications beyond SNP detection have remained somewhat obstructed due to technological challenges. With the advent of long reads and high consensus accuracy, SMRT Sequencing overcomes many of the technical hurdles faced by Sanger and NGS approaches, opening a broad range of untapped clinical sequencing opportunities. Flexible multiplexing options, highly adaptable sample preparation method and newly improved two well-developed analysis methods that generate highly-accurate sequencing results, make SMRT Sequencing an adept method for clinical grade targeted sequencing. The Circular Consensus Sequencing (CCS) analysis pipeline produces QV 30 data from each single intra-molecular multi-pass polymerase read, making it a reliable solution for detecting minor variant alleles with frequencies as low as 1 %. Long Amplicon Analysis (LAA) makes use of insert spanning full-length subreads originating from multiple individual copies of the target to generate highly accurate and phased consensus sequences (>QV50), offering a unique advantage for imputation free allele segregation and haplotype phasing. Here we present workflows and results for a range of SMRT Sequencing clinical applications. Specifically, we illustrate how the flexible multiplexing options, simple sample preparation methods and new developments in data analysis tools offered by PacBio in support of Sequel System 5.1 can come together in a variety of experimental designs to enable applications as diverse as high throughput HLA typing, mitochondrial DNA sequencing and viral vector integrity profiling of recombinant adeno-associated viral genomes (rAAV).
Sequencing the previously unsequenceable using amplification-free targeted enrichment powered by CRISPR/Cas9
Genomic regions with extreme base composition bias and repetitive sequences have long proven challenging for targeted enrichment methods, as they rely upon some form of amplification. Similarly, most DNA sequencing technologies struggle to faithfully sequence regions of low complexity. This has especially been true for repeat expansion disorders such as Fragile X syndrome, Huntington’s disease and various Ataxias, where the repetitive elements range from several hundreds of bases to tens of kilobases. We have developed a robust, amplification-free targeted enrichment technique, called No-Amp Targeted Sequencing, that employs the CRISPR/Cas9 system. In conjunction with Single Molecule, Real-Time (SMRT) Sequencing, which delivers long reads spanning the entire repeat expansion, high consensus accuracy, and uniform coverage, these previously inaccessible regions are now accessible. This method is completely amplification-free, therefore removing any PCR errors and biases from the experiment. Furthermore, this technique also preserves native DNA molecules, allowing for direct detection and characterization of epigenetic signatures. The No-Amp method is a two-day protocol, compatible with multiplexing of multiple targets and samples in a single reaction, using as little as 1 µg of genomic DNA input per sample. We have successfully targeted a number of repeat expansion disorder loci (HTT, FMR1, ATXN10, C9orf72) with alleles as long as >2700 repeat unites (>13 kb). Using the No-Amp method we have isolated hundreds of individual on-target molecules, allowing for reliable repeat size estimation, mosaicism detection and identification of interruption sequences – all aspects of repeat expansion disorders which are important for better understanding the underlying disease mechanisms.