CYP2D6 is a highly polymorphic gene with more than 130 named variants, including deletions, duplications, single nucleotide polymorphisms, and other types of variation (Butler, 2018; Black et al., 2011). These variants affect the rate of metabolism in human individuals of approximately 25% of common prescription drugs (Owen et al., 2019;). PacBio SMRT sequencing is a proven tool for the interrogation of CYP2D6 variants (Qiao et al., 2016; Buermans et al., 2017). Now with HiFi sequencing, we have developed a streamlined end-to-end workflow for the more accurate detection of highly polymorphic CYP2D6 loci. This study also evaluates the advantage of HiFi reads for the sequencing of full-length CYP2D6 genes with variants previously annotated by other technologies.
Twenty-two Coriell pharmacogenomic samples containing variant CYP2D6 alleles were amplified using long-range PCR. The primer pairs for the amplification of upstream CYP2D6 gene duplications and the downstream CYP2D6 genes were adapted from a publication in Pharmacogenomics (Qiao et al., 2019). A 2-step PCR method was used for the addition of the unique barcode to each sample, allowing pooling of multiple samples for SMRTbell library prep. The resulting SMRTbell Library was then sequenced on the PacBio Sequel II/IIe system for 20-hours. HiFi reads (>QV20) were demultiplexed on SMRTlink and clustered into haplotypes. The consensus reads of each haplotype were produced using the “pbaa” amplicon analysis and then mapped to the human reference genome GRCh38 for the assignment of CYP2D6 types.
More than 700,000 full-length HiFi reads were generated with an average read length of 8.2 kb and a mean accuracy of 99.9%. Nearly all (>99%) demultiplexed reads were on target to the CYP2D6 locus. Genotyping of the CYP2D6 region with PacBio HiFi reads identified all expected upstream duplications and downstream CYP2D6 alleles including single nucleotide variants, except for *5 allele which is a complete deletion. For 21 of 22 samples, the types from HiFi reads matched the diplotypes identified from microarrays and qPCR, while providing full resolution of each allele. One sample was identified as being mistyped by microarray as *1/*41. HiFi sequencing produced a correct type of *33/*41. In addition, for 4/21 samples HiFi sequencing identified duplications missed by microarray or real-time PCR.
The PCR and sequencing assay we have presented here for the detection of CYP2D6 variants is robust and specific. Assignment of new alleles or duplications on pharmacogenomic samples from HiFi reads suggests that PacBio sequencing technology can reveal new diplotypes that were not characterized accurately by other technologies. This study demonstrates that HiFi sequencing provides much higher resolution than either microarray or real-time PCR for the detection of polymorphic genes, while maintaining sensitivity and accuracy.