This month’s publications show how PacBio HiFi sequencing is being used to address complex problems in clinical genetics, brain research, and crop development. The March 2025 edition of our Powered by PacBio blog series includes four studies: from resolving the CYP21A2 region to support improved testing for Congenital Adrenal Hyperplasia, to mapping microglia isoforms in neurodegenerative disease, to analyzing gene regulation in NOTCH2NL duplications and identifying paralog variation in African eggplant.
Together, these papers reflect how researchers are applying long-read sequencing to questions that short reads haven’t been able to answer—combining HiFi data with tools like the Iso-Seq method and Fiber-seq to generate clearer, more complete views of genetic structure and function.
Jump to topic:
Human health | Neurology | Human genome research | Agrigenomics
Human health
In this preprint, researchers from UC Irvine, France, PacBio, Children’s Natl, U CO, U MI, Children’s Hosp Chicago, UCSC, and Zucker Med NY demonstrate that long-read sequencing is “able to identify and phase the various complex alleles in the CYP21A2 region” and “could be used to identify variants causing multiple forms of CAH [Congenital Adrenal Hyperplasia] in a single test.”
Key highlights:
- CAH (fertility issues, obesity, insulin resistance, and dyslipidemia) is among the most common inherited disorders (affecting 1 in 10,000-18,000) and can be potentially lethal if left untreated.
- The commonly implicated CYP21A2gene and surrounding RCCX region present “dizzying complexity”. The most comprehensive clinical test available includes 4 long-range PCRs, bidirectional Sanger sequencing and MLPA. “Even then, alleles are not phased, and testing parents is critical to avoid misinterpretation.” “Predictably, the region has historically been intractable with short-read sequencing”, and “optical genome mapping was not able to reliably resolve module counts or fusions.”
- Revio WGS on six samples (three known and one unknown trio) used PacBio’s Paraphase tool to “call the SNVs, locate them in individual RCCX modules, ordered the modules in each haplotype, and identify and phase fusion and whole-gene deletion alleles”. The “Results were concordant with available clinical reports and, in addition, phased and calculated the number of modules for each haplotype, which … is not currently determined by clinical tests.”
- As a result, clinical reporting practices have evolved: “To record the genotypic complexity of this region newly made accessible with LRS, the DSD-TRN[Disorders of Sex Development Translational Research Network] has created a new standardized form for clinical data collection for CAH.”
Conclusion:
HiFi sequencing on the Revio system enables comprehensive phasing and structural characterization of CYP21A2 variants, outperforming short-read sequencing and optical genome mapping. By consolidating multiple tests—including long-range PCR, Sanger sequencing, and MLPA (Multiplex Ligation-dependent Probe Amplification)—into a single, streamlined workflow, PacBio simplifies genetic testing and helps improve diagnostic accuracy, leading to a new clinical data collection framework for CAH.
Neurology
In this preprint, researchers from Mt Sinai, Spain, PNWL, Rush U IL, VA Med Ctr NY, present a “isoform-centric microglia genomic atlas (isoMiGA)”.
Highlights:
- Microglia, the brain’s innate immune cells, are genetically linked to several neurodegenerative diseases, but “identifying genetic effects on splicing is challenging because of the use of short sequencing reads.”
- Using Iso-Seq on 30 postmortem brains (across diagnoses including AD and Lewy body dementia) researchers identified 35,879 novel microglia isoforms, including “new categories of noncoding regulatory isoforms, such as intron retention, antisense and readthrough fusions” For the well-studied AD risk genes CD33 and TREM2, we “observed multiple fusion isoforms.”
- “We show that these isoforms are involved in stimulation response and brain region specificity … and found associations with genetic risk loci in Alzheimer’s and Parkinson’s disease.”
Conclusion:
Iso-Seq enables the discovery of novel, functionally relevant microglia isoforms, including novel splicing events, that short-read sequencing cannot detect. By revealing new regulatory isoforms and genetic risk associations, Iso-Seq advances our understanding of neurodegenerative disease mechanisms.
Human health research
Genetic diversity and regulatory features of human-specific NOTCH2NL duplications
In this preprint, researchers from UW, UCSC, Italy, UC Riverside, and UNC resolve one of the most challenging regions of the human genome, finding that “this region is among the most frequently rearranged regions of the human genome.”
Highlights:
- NOTCH2-N-terminus-like genes, implicated in human brain expansion relative to apes, “have been difficult to sequence and assemble with short-read technologies”, and “NOTCH2NL gene family annotations are frequently incorrect in previous reference genome builds.”
- Long-read assemblies from 67 human (HPRC) and 12 ape genomes helped resolve genome structures, including paralogs vs. CNVs, SVs, and gene conversions—revealing that “for 28% of haplotypes leading to a previously undescribed paralog, NOTCH2tv.”
- The Iso-Seq method and Fiber-seq were used to provide “an initial assessment of potential regulatory regions that are shared and those that distinguish paralogs specifically during human neurodevelopment”, leading to the discovery of “unreported gene fusions”, and “distinct differences within the transcript abundance of each of the NOTCH2 paralogs, suggesting that paralog-specific accessible chromatin elements may be creating unique gene regulatory environments for each of the NOTCH2 paralogs.”
Conclusion:
HiFi data offers resolution of even the most complex genome regions. When paired with the Iso-Seq method and Fiber-seq, it reveals previously inaccessible biology—here, applied to a gene family essential to human brain development and cortex expansion compared to our closest evolutionary relatives.
Agrigenomics
Solanum pan-genetics reveals paralogues as contingencies in crop engineering
In this study, researchers from CSHL, JHU, Boyce Thompson NY, Colombia, Uganda, Canada, UK, Mount Holyoke MA, and Cornell used HiFi-based references for 22 Solanum species to establish a pan-genome, finding “thousands of gene duplications, particularly within key domestication gene families”, and paralog diversification as an “underexplored contingencies in trait evolvability”.
Highlights:
- Generated HiFi-based, chromosome-scale references for 22 species—including 13 indigenous crops—with an average QV>53 and N50 of 65.8Mbp.
- Conducted detailed pan-genome analysis, including a complex landscape of gene duplications. “Hundreds of global and lineage-specific gene duplicationsexhibited dynamic evolutionary trajectories in paralog sequence, expression, and function, including among members of key domestication gene families.” Analysis included core and dispensable genes, multi-tissue transcriptomics of retained paralogs, and lineage-specific paralog diversification and compensatory relationships.
- In an additional analysis of 10 African eggplant cultivars, researchers found that the “loss of an ancient redundant paralogof the classical regulator of stem cell proliferation and fruit organ number … was compensated by a lineage-specific tandem duplication.”
Conclusion:
“Paralog diversifications over short evolutionary periods are critical yet underexplored contingencies in trait evolvability”, highlighting the importance of “resolving these underexplored contingencies as we strive to improve indigenous crops for local and climate change–adapted agriculture.”
Ready to make discoveries of your own?
The March 2025 publications demonstrate how researchers are using HiFi sequencing to address specific challenges in genome interpretation—from simplifying clinical workflows for CAH, to improving annotation of brain-expressed isoforms, to refining our understanding of paralog function in both humans and crops. These studies highlight how access to more complete and accurate sequence data is helping teams ask new questions and build more reliable models of biology across different research areas.
Stay tuned for next month’s round-up of recent publications using PacBio technology.
Ready to explore what HiFi can do in your lab? Let’s get started.