Menu

Scientific publications

Publications featuring PacBio long-read + short-read sequencing data

European Journal of Human Genetics  |  2026

HiFi long-read RNA sequencing enhances clinical diagnostics in rare disorders

Carolina Jaramillo Oquendo, Federico Ferraro, Htoo A. Wai, Heather Ferrao, Herma van der Linde, Evita Karelioti, Liz Tseng, Harsharan Dhillon, Sam Holt, David J. Bunyan, Laura Donker Kaat, Marieke van Dooren, Jeff Zhou, Sarah Ennis, John W. Holloway, Tjakko J. van Ham & Diana Baralle

We evaluated the potential of PacBio long-read RNA-seq to detect pathogenic splicing events in rare disorders, comparing its performance to short-read RNA-seq...These results demonstrate that long-read RNA-seq enhances detection and interpretation of clinically relevant splicing events, supporting its integration into diagnostic workflows for rare diseases.
Cell Genomics  |  2026

Long-read genome sequencing improves detection and functional interpretation of structural and repeat variants in autism

Milad Mortazavi, James Guevara, Joshua Diaz, Stephen Tran, Helyaneh Ziaei Jam, Chloe Reeves, Sergey Batalov, Kristen Jepsen, Matthew Bainbridge, Aaron D. Besterman, Melissa Gymrek, Abraham A. Palmer, Jonathan Sebat

Long-read whole-genome sequencing (LR-WGS) technologies enhance the discovery of structural variants (SVs) and tandem repeats (TRs). We performed LR-WGS on 267 individuals from 63 autism spectrum disorder (ASD) families and generated an integrated call set combining long- and short-read data. LR-WGS increased detection of gene-disrupting SVs and TRs by 33% and 38%, respectively, and enabled identification of novel exonic de novo germline and somatic SVs. We observed complex SV patterns, including a class of nested duplication-deletion events. By joint analysis of phased genetic variation and DNA methylation, we identified deletions of imprinted genes and demonstrated the effect of intermediate TR expansions (35–54 CGG) on the methylation of FMR1 promoter. Rare SVs, TRs, and damaging SNVs together accounted for 7.4% (95% confidence interval [CI], 2.7%–17%) of the heritability of ASD. These findings demonstrate how LR-WGS can resolve complex genetic variation and its functional consequences and regulatory effects in a single assay.
bioRxiv  |  2026

Genome-wide classification of tumor-derived reads from bulk long-read sequencing

Toby M. Baker1, Nedas Matulionis1 , Cassidy Andrasz1,2 , Dan Gerke1 Natalia Garcia-Dutton1,2 , David Atkinson1 , Kami Chiotti1 , Selina Wu1,2 , Suchita Lulla1,2 , Jieun Oh1,2, Helena Winata1,2 , Rong-Rong Huang3 , Jenny Lester4 , Beth Y. Karlan4, Paul T. Spellman1,2 1) Division of Hematology Oncology, Department of Medicine David Geffen School of Medicine, University of California Los Angeles, 2) Department of Genetics, David Geffen School of Medicine University of California Los Angeles, 3) Department of Pathology and Laboratory Medicine, David Geffen School of Medicine University of California Los Angeles, 4) Department of Obstetrics and Gynecology, David Geffen School of Medicine University of California Los Angeles

DNA extracted from tissue samples typically derives from a complex mixture of cell types. Without single cell analysis, it has been generally impossible to determine the cell type of origin for most molecules. One clear example of this is in the complex milieu of a human neoplasm. Here, we develop ROCIT (https://github.com/tobybaker/rocit), a transformerbased model to classify the tumor or non-tumor origin of individual reads from bulk tumor samples sequenced with long-read whole genome sequencing. Using somatic mutations to derive training data, ROCIT uses read-level methylation patterns to accurately classify reads from anywhere in the genome without requiring the adjacent normal tissue or the explicit identification of tumor differentially methylated regions. We apply ROCIT to a cohort of prostate and ovarian tumors and demonstrate high classification accuracy across the entire genome. We then demonstrate the potential of ROCIT predictions to improve somatic variant calling. ROCIT represents a major step forward in the analysis of bulk tumors with long-reads, enabling the accurate and sensitive identification of reads with specific cell types of origin genome-wide.
NAR Molecular Medicine  |  2026

Systematic evaluation of long- and short-read RNA-seq for human peripheral blood

Sadahiro Iwabuchi , Alessandro Nasti , Hikari Okada , Yumie Takeshita , Taka-Aki Sato , Takeshi Urabe , Toshinari Takamura , Takuro Tamura , Atsushi Tajima , Kenichi Matsubara ...et al

RNA sequencing (RNA-seq) technologies enable comprehensive transcriptomic profiling, yet systematic comparisons using identical biological samples remain limited. Here, we performed a multi-faceted comparison of long-read (PacBio) and short-read (Illumina) RNA-seq using the same RNA from peripheral blood cells of four healthy donors. Unlike prior studies that aggregate datasets from different sources, this study evaluates platform-dependent performance across gene expression, transcript variants, fusion genes, primary microRNAs (pri-miRNAs), and immune receptor complementarity-determining region 3 (CDR3) regions using widely available software, highlighting both reproducibility and accessibility. Long-read sequencing outperformed short-read sequencing in detecting complex alternative splicing events, novel transcript isoforms, and full-length immune receptor sequences, particularly immunoglobulin heavy chains, enhancing clonotype resolution. Both platforms captured largely overlapping pri-miRNAs and CDR3 sequences, but each also detected unique elements, demonstrating that total RNA can serve as a proxy for these specialized features when dedicated kits are not used. Short-read sequencing retained superior quantification accuracy for highly expressed genes and stronger concordance with microarray data. Collectively, our findings reveal the complementary strengths of long- and short-read RNA-seq and provide a practical framework for systematic, side-by-side comparison of transcriptomic features, emphasizing the benefits of using the same input material and standard analysis pipelines.
medRxiv  |  2026

Clinical long-read genome sequencing for rare disease diagnostics

Tessa J.J. de Bitter, Bart van der Sanden, Lydia Sagath, Wolfram Höps, Peer Arts, Michelle de Groot, Marjan M. Weiss, Ronny Derks, Amber den Ouden, Simone van den Heuvel, Raoul G.J. Timmermans, Timon van Leeuwen, Jordi Corominas Galbany, Jos Smits, Lot Snijders Blok, Tom Hofste, Marloes Steehouwer, Nick Zomer, Quentin Sabbagh, Erik-Jan Kamsteeg, Dorien Lugtenberg, Ermanno A. Bosgoed, Richard J. Rodenburg, Su Ming Sun, Arjen R. Mensenkamp, Marjolijn J.L. Ligtenberg, Nicole de Leeuw, Debby M.E.I. Hellebrekers, Alexander P.A. Stegmann, Aimée D.C. Paulussen, Marinus J. Blok, Wendy A.G. van Zelst-Stams, Arthur van den Wijngaard, Helger G. Yntema, Christian Gilissen, Alexander Hoischen, Lisenka E.L.M. Vissers

This preprint from Radboud Netherlands highlights the potential for PacBio WGS “as a feasible and effective first-tier test for rare disease diagnostics.”
npj | genomic medicine  |  2026

Pharmacokinetic recall study of Estonian Biobank participants with novel genetic variants in CYP2C19 and CYP2D6

Kristi Krebs, Laura Birgit Luitva, Anette Caroline Kõre, Raul Kokasaar, Maarja Jõeloo, Georgi Hudjashov, Kadri Maal, Elisabet Størset, Birgit Malene Wollmann, Liis Karo-Astover, Krista Fischer, Estonian Biobank Research Team, Volker M. Lauschke, Magnus Ingelman-Sundberg, Espen Molden, Alar Irs, Kersti Oselin, Jana Lass & Lili Milani

CYP2C19 and CYP2D6 are involved in the hepatic metabolism of approximately 35–40% of clinically used drugs. We conducted an in vivo phenotyping study encompassing 114 Estonian Biobank participants to evaluate the functional impact of rare or novel single-nucleotide and structural variants in the CYP2C19 and CYP2D6 genes using omeprazole and metoprolol as respective probe drugs. Plasma concentrations of these drugs and their metabolites were measured at 10 time points, and parent drug-to-metabolite ratios were calculated to determine enzymatic activity. Long-read sequencing enabled high-resolution star allele calling. Our results provide the first in vivo confirmation that partial gene and intragenic deletions in CYP2C19 (CYP2C19*37 and CYP2C19*42), enriched in Estonians and Finns, are associated with poor metaboliser phenotypes (P < 1.2 × 10−7). additionally, we offer in vivo evidence of reduced metabolic activity the cyp2d6*124 allele and a novel missense variant (c.940c>A) in exon 6 of CYP2D6. Furthermore, we observed that inhibitor exposure was significantly associated with higher metabolic ratios for both CYP2C19 (P = 3.0 × 10−6) and CYP2D6 (P = 0.02). Our findings emphasise the importance of identifying genetic variants in CYP2C19 and CYP2D6 beyond commonly assessed star alleles and that profiling for drug interactions can provide more precise assignments of metabolic phenotypes and improve personalised treatment.
bioRxiv  |  2026

TDP-43 dysfunction leads to the accumulation of cryptic transposable element-derived exons, crypTEs, in iPSC derived neurons and ALS/FTD patient tissues

Isobel Bolger, Regina Shaw, Oliver H Tam, Cláudio Gouveia Roque, Christopher A Jackson, Kathryn O’Neill, NYGC ALS Consortium, Colin Smith, Hemali Phatnani, Karthick Natarajan, Molly Gale Hammell

In this preprint from NYU, NYGC, and the University of Edinburgh UK describes “a novel mechanism by which transposable element dysregulation impacts Amyotrophic Lateral Sclerosis (ALS)”, showing that TDP-43 dysfunction leads to the accumulation of cryptic transposable element-derived exons (crypTEs) in iPSC derived neurons and ALS/FTD patient tissues.
Molecular Neurodegeneration  |  2025

Entering the era of precision medicine to treat amyotrophic lateral sclerosis

Frances Theunissen, Loren Flynn, Alfredo Iacoangeli, Ahmad Al Khleifat, Ammar Al-Chalabi, James J. Giordano, Masha Strømme & P. Anthony Akkari

We address the advances in our understanding of the complex genetic architecture of ALS, including the varying models of genetic contribution to disease, and the importance of understanding population genetics and genetic testing when considering patient selection for clinical studies. Additionally, we discuss the advances in long-read whole-genome sequencing technology and how this method can improve streamlined genetic testing and our understanding of the genetic heterogeneity in ALS.
bioRxiv  |  2025

A telomere-to-telomere map of somatic mutation burden and functional impact in cancer

Min-Hwan Sohn, Danilo Dubocanin, Mitchell R Vollger, Youngjun Kwon, Anna Minkina, Katherine M Munson, Samuel FM Hart, Jane E Ranchalis, Nancy L Parmalee, Adriana E Sedeño-Cortés, Jeffrey Ou, Natalie YT Au, Stephanie Bohaczuk, Brianne Carroll, Christian D Frazar, William T Harvey, Kendra Hoekzema, Meng-Fan Huang, Caitlin N Jacques, Dana M Jensen, J Thomas Kolar, Rosa Lee, Jiadong Lin, Kelsey Loy, Taralynn Mack, Yizi Mao, Meranda M Pham, Erica Ryke, Joshua D Smith, Lila Sutherlin, Elliott G Swanson, Jeffrey M Weiss, SMaHT Assembly WG, Claudia Carvalho, Tim HH Coorens, Kelley Harris, Chia-Lin Wei, Evan E Eichler, Nicolas Altemose, James T Bennett, Andrew B Stergachis

Oncogenesis involves widespread genetic and epigenetic alterations, yet the full spectrum of somatic variation genome-wide remains unresolved. These findings define the full landscape of a cancer’s somatic variation and their functional impact, establishing a blueprint for T2T studies of mosaicism.
bioRxiv  |  2025

Long-read sequencing reveals extensive FMR1 somatic mosaicism in Fragile-X associated tremor/ataxia syndrome in human brain

Anna Dischler, Akshay Avvaru, Susana Lopez-Ignacio, Cristina Lau, Martin W. Breuss, Verónica Martínez Cerdeño, Harriet Dashnow, Caroline M. Dias

This work provides new insight into the extensive molecular variation underlying FXTAS in human brain and establishes a framework for studying repeat expansion disorders more broadly, highlighting the potential of long-read sequencing to advance our fundamental understanding of somatic mosaicism of these intractable regions of our genome.
medRxiv  |  2025

Population-scale Long-read Sequencing in the All of Us Research Program

Kiran V Garimella, Qiuhui Li, Julie Wertz, Samuel K Lee, Fabio Cunial, Yongqing Huang, Yulia Mostovoy, Ryan Lorig-Roach, Adam English, Hang Su, Shawn Levy, Donna M Muzny, Chelsea Berngruber, Matt C Danzi, William T Harvey, Emily L LaPlante, Karynne Patterson, Allison N Rozanski, Sophie Schwartz, Beri Shifaw, Yuanyuan Wang, Isaac Wong, Isaac R. L. Xu, Shadi Zaheri, Stephan Zuchner, Xinchang Zheng, Shannon Dugan-Perez, Michal Izydorczyk, Heer Mehta, Richard A Gibbs, Lee Lichtenstein, Namrata Gupta, Niall Lennon, Stacey Gabriel, All of Us Research Program Long Read Working Group, Winston Timp, Kimberly F Doheny, Tara Dutka, Anjene Musick, Chia-Lin Wei, Fritz J Sedlazeck, Michael C Schatz, Michael E Talkowski, Evan E Eichler

The All of Us Research Program (AoU) is a national biobank seeking to enroll one million individuals in the United States to link genomic and biomedical data, including short- and long-read whole-genome sequencing (srWGS/LRS), with rich electronic health record (EHR) information. Here, we present the first large-scale analyses of long-read sequencing (LRS) in AoU and offer a new framework for deriving genomic insights into complex structural variation (SV) of relevance to human health and disease.
JAMA Pediatrics  |  2025

Clinical long-read sequencing test for genetic disease diagnosis

Isabelle Thiffault, PhD1, Emily Farrow, PhD1; Cassandra Barrett, PhD2 et al. 1 Department of Pathology and Laboratory Medicine, Children’s Mercy Kansas City, Kansas City, Missouri 2 Division of Clinical Genetics, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, Missouri 3 Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, Missouri

This landmark study demonstrates how HiFi sequencing can transform pediatric disease discovery by delivering 10% higher success over all prior testing methods, helping to provide families with results in <1 month vs. 3, and reducing the need for multiple stressful and costly rounds of testing.
MedRxiv  |  2025

Expanded map of genomic imprinting reveals insight into human disease

Craig Smail, Warren A. Cheung, Boryana Koseva, Adam F. Johnson, Chengpeng Bi, Carl F. Schreck, Michael Lydic, Kristin Holoch, Elena Repnikova, John Herriges, Courtney Marsh, Isabelle Thiffault, Tomi Pastinen, Elin Grundberg

Alle-specific methylation is often underappreciated but plays a crucial role in understanding what drives development and disease. Short reads miss most of this signal (up to 60%), proving methylation isn’t just extra data, it’s essential for discovery. With HiFi, you get it automatically with every genome.
bioRxiv  |  2025

RNA splicing dynamics in CD8 T cells uncovers isoforms that impact T cell-mediated cancer immunotherapy

Shay Tzaban, Priyanga Appasamy, Elad Zisman, Shiri Klein, Reyut Lewis, Houlin Yu, Akanksha Khorgade, Marc A. Schwartz, Moshe Sade-Feldman, Thomas Eisenhaure, Oren Parnas, Aron Popovtzer, Cyrille Cohen, Eric Shifrut, Aziz M. Al’Khafaji, Rotem karni, Galit Eisenberg, Nir Hacohen, Michal Lotem

This study shows the power of combining HiFi long-read sequencing with single-cell resolution to map isoform usage in human CD8⁺ T cells. By capturing dynamic splicing programs that short-read methods often miss, the team not only redefined T cell states, but also uncovered novel immune checkpoints with therapeutic potential. This work lays the foundation for isoform-selective immunotherapies that are anchored in discovery, validated in vivo, and guided by a generalizable single-cell framework.
bioRxiv  |  2025

Intron retention regulates STAT2 function and predicts immunotherapy response in lung cancer

Ryan P. Englander, Mattia Brugiolo, Te-Chia Wu, Mitch Kostich, Nathan K. Leclair, SungHee Park, Jacques Banchereau, Peter Yu, Andrew Salner, Romain Banchereau, Karolina Palucka, Olga Anczuków

The Iso-seq method doubled the number of isoforms detected compared to short-read RNA-seq, revealing splicing events with direct relevance to cancer immunotherapy. The authors note that long-read RNA sequencing “may hold the key to breakthroughs that lead to the next wave of therapeutic advances.” With current workflows using Kinnex kits and the Revio system, such insights can now be achieved at scale and reduced cost, requiring only half a SMRT Cell to generate equivalent data compared to 12 SMRT Cells in the study.
Keyword search
Author search
Year search

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.