1Pacific Biosciences (PacBio), Computational Research, Menlo Park, United States; 2Pacific Biosciences (PacBio), Bioinformatic Engineering, Menlo Park, United States; 3Pacific Biosciences (PacBio), Market Development, Menlo Park, United States; 4Pacific Biosciences (PacBio), Segment Marketing, Menlo Park, United States; 5Pacific Biosciences (PacBio), Precision Health Segment Marketing, Menlo Park, United States; 6Pacific Biosciences (PacBio), Human Genomics Segment Marketing, Menlo Park, United States
The CYP2D6 locus is well known for its importance to pharmacogenetics as well as for its high diversity and complex genomic setting. Resolving individual alleles at this locus using short-read sequencing technologies requires inference-based methods due to ambiguous mapping in the presence of highly homologous pseudogenes. In contrast, long-range sequencing with PacBio HiFi reads directly resolves and phases a wide range of complicated and difficult genetic loci without inference. We present a novel bioinformatics workflow using PacBio HiFi reads which enables rapid and precise diplotyping and star(*)-allele classification of CYP2D6.
In this work we designed a set of primers to amplify full CYP2D6 genes and flanking sequence. A multi-primer approach was used to separately amplify primary CYP2D6 genes, duplicate genes, hybrid genes, and fully deleted *5 alleles. We applied this targeted strategy to 22 samples from Coriell and sequenced the amplicons on a PacBio Sequel II System. To generate resolved CYP2D6 *- allele diplotypes we describe a two-step process : 1) Cluster and consensus of PacBio HiFi reads, 2) Direct comparison of phased variant sets from consensus sequences to star-alleles described in PharmVar. Additional information regarding fusion alleles is also provided to further identify hybrid categories.
Direct CYP2D6 *-allele typing generated by this workflow resulted in concordant results compared to orthogonal technologies. Differences between previous technologies’ results and PacBio HiFi sequencing were due to higher resolution and improved calls via our method, including better CNV calls, *5 deletion calling, and high resolution subtyping for all alleles.