June 1, 2021

Resolving Complex Pathogenic Alleles using HiFi Long-range Amplicon Data and a New Clustering Algorithm

Author(s): Harting, John and Heiner, Cheryl, and McLaughlin, Ian and Kronenberg, Zev

Many genetic diseases are mapped to structurally complex loci. These regions contain highly similar paralogous alleles (>99% identity) that span kilobases within the human genome. Comprehensive screening for pathogenic variants amongst paralogous sequences is incomplete and labor intensive using short-reads or optical mapping. In contrast, long-range targeted amplification and PacBio HiFi sequencing fully and directly resolves and phases a wide range of pathogenic variants without assembly or inference. To capitalize on the accuracy of HiFi amplicon data we designed a new amplicon analysis tool, pbAA. pbAA uses a new sequence clustering algorithm to rapidly deconvolve (separate) a mixture of haplotypes, enabling precise diplotyping, and disease allele classification. In this experiment, we analyzed two sets of gene-pseudogene systems, GBA and CYP, that are the second and eighth most common carrier disease alleles, respectively. Samples tested were selected from the Coriell catalog known to have pathogenic variants troublesome to test for with standard short-read assays. Co-amplified long-range PCR amplicons were generated for GBA (12kb)/GBAP1 (15kb), responsible for Gaucher disease, as well as CYP21A2 (10kb)/CYP21A1P (8kb), responsible for congenital adrenal hyperplasia. We obtained 7 samples to test the CYP21A2 region and 13 separate samples for GBA.  HiFi reads were then generated from the amplicon libraries on both Sequel and Sequel II Systems, with replicated samples, to achieve a 24-sample multiplex for each target. Consensus amplicons were produced using pbAA and variants were determined using minimap2 alignments along with a custom SQL database for characterizing and reporting results.  From these data we were able to accurately call all pathogenic variants in the test samples for all replicates, including whole-gene deletions, gene duplication, gene fusions, recombinant exons, and phased complex heterozygotes.  In one trio affected by adrenal hyperplasia, three large structural variants were correctly and independently attributed to the parents and proband, including a duplication of CP21A1P and a CYP21A1P-CYP21A2 gene fusion in the mother and a CYP21A2 deletion in the father. This experiment demonstrates how PacBio HiFi data, analyzed with pbAA, simplifies targeted disease allele identification.  

Organization: PacBio
Year: 2021

View Conference Poster

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.