Menu
Sprite decoration

Scientific posters

AGBT 2024  |  2024

Building the spectrum of ground truth genetic variation in a four-generation 28-member CEPH family

Zev N Kronenberg1, Katherine M Munson2, David Porubsky2, Cillian Nolan1, William J Rowell1, Brent S Pedersen3, Cairbre Fanslow1, Primo Baybayan1, Nidhi Koundinya2, William Harvey2, Kendra Hoekzema2, Jordan Knuth2, Gage Garcia2, Tom Mokveld1, Egor Dolzhenko1, Scott Watkins3, Deborah W Neklason3, Aaron R Quinlan3, Lynn B Jorde3, Evan E Eichler2 and Michael A Eberle1 1) PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2) University of Washington, Seattle WA, 98195 3) University of Utah, Salt Lake City, UT, 84114

Comprehensive ground truth data is required for validating sequencing pipelines in clinical settings, assessing the strengths and weaknesses of genome sequencing technologies, and improving variant detection software. Truth sets of genomic variation have lagged advances in sequencing accuracy and completeness; furthermore, current truth sets are confined to easy-to-characterize regions that comprise ~80% of the human genome. These benchmarking sets are mostly limited to small variants, missing complex variants like SVs and mega-bases of tandem repeat variation. Developing more complete truth sets will spur improvement in variant calling algorithms in under-characterized regions of the genome. We have built a new truth set, leveraging the power of a large four generation 28-member family (CEPH 1463) using multiple sequencing technologies (PacBio HiFi and SBB, ONT, Illumina, and Strand-seq). Nearly all of these samples are derived from blood thus eliminating the problem of cell line artifacts. Using ensemble-based variant calls from read mapping and long-read genome assembly, we built a highly sensitive variant call set spanning the spectrum of variant size and complexity. From these calls, we built haplotypes and mapped recombination events among 10 second and third generation family members (two parents and eight children). Integrating the high-resolution haplotype map with multiple variant callers across sequencing technologies, we have built a truth set for a ten-member pedigree. In total, our pedigree-validated truth set contains 5,023,261 SNVs, 1,009,108 indels and 20,572 structural variants totaling 16 Mb of genetic variation. By using inheritance patterns to validate the accuracy of the variant calls, this benchmarking database combines the strengths of the different technologies increasing the number of small variants by 14% and 6% in NA12878 compared to the Genome in a Bottle and Platinum Genomes, respectively. Compared to GIAB’s NA12878, we have expanded the high quality regions from 82.9% of the genome to 91.4%. Using technical replicates, we evaluate the accuracy of different sequencing technologies and variant callers against this comprehensive dataset. The full sequencing data and validated variants identified in this study will be publicly available to serve as a valuable community resource, as the largest multi-generational pedigree sequenced with long-read technologies.
AGBT 2024  |  2024

Resolving variation in polymorphic regions of the human genome

Egor Dolzhenko1, Graham S Erwin2, Katherine Wang2, Zev Kronenberg1, William J Rowell1, Anna C Ferrari3, Garrison Pease3, Daniel Schwartz3, Benjamin Gartrell3, Ahmed Aboumohamed3, Alex Sankin3, Pedro Maria3, Kara Watts3, John M Greally4, Patrick Wilkinson5, Yashoda Rajpurohit5, John Loffredo5, Denis Smirnov5, Manuel A Sepulveda5, Charles G Drake5, Alex Robertson1, Michael P Snyder2, Michael A Eberle11. PacBio, Menlo CA, USA; 2. Stanford University,CA, USA; 3. Montefiore-Einstein Cancer Center, NY, USA; 4. Einstein Epigenomics Center, NY, USA; 5. Janssen Research and Development LLC, PA, USA

The human genome contains thousands of repeat-rich polymorphic regions whose structure has not been systematically described. These regions produce large collections of variant calls sometimes called variation clusters. Variation clusters are typically excluded from tertiary analysis because it is difficult to interpret and catalog them. One example is the 3.5 Kbp region in an intron of the KCNMB2 gene which contains over 30 constituent simple repeats that jointly create many insertions, deletions, and mismatches in alignments of reads over this region. The corresponding variant calls are often incorrectly prioritized as potentially pathogenic, requiring significant resources to curate and rule out. To address these issues, we propose a novel computational framework to systematically detect, annotate, and catalog variation clusters. A distinguishing characteristic of our approach relative to traditional variant calling methods, is the ability to annotate and resolve entire regions of high sequence polymorphism as single units instead of fragmenting them into variation clusters. These regions can be subsequently genotyped using the recently developed tandem repeat genotyping tool (TRGT). We show that our method can accurately detect reference coordinates and resolve structures of KCNMB2, MUC1, CEL, INS and 20 other medically relevant variable number tandem repeats. Using real and simulated data we also show that our method can locate and call pathogenic expansions of 50 disease-causing repeats and nine polyalanine repeats composed of highly variable motif sequences. To demonstrate the usefulness of our method for cancer genome studies, we applied it to normal, polyp, and adenocarcinoma PacBio HiFi samples originating from the same individual and identified a tandem repeat that progressively expands in length from normal to polyp to adenocarcinoma samples in the 5′ UTR of LIMD1, a reported tumor suppressor gene. To further highlight our ability to resolve variation we characterized differences in repeat composition and methylation between three prostate tumors and their normal counterparts and also a panel of 100 unrelated genomes. To make all these analyses accessible to other genome researchers, we are releasing a learning resource with tutorials describing how to catalog variation in the polymorphic regions of the human genome in publicly available PacBio HiFi samples.
PAG 2024  |  2024

A high-throughput, low-cost automated library prep method for PacBio long-read sequencing at scale

Gregory Young1, Aurelie Souppe1, Kaitlyn Scott1, Davy Lee1, Gloria Diaz1, Lin Wang1, Nethmi Gunathilake1, and Greg Concepcion1 1. PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025

Bottlenecks in long-read library prep workflows such as DNA shearing (aka fragmentation) and size selection are barriers to scaling long-read sequencing (LRS) independent of sequencing costs. Here we present a new automated highthroughput library prep method for PacBio native long-read sequencing that removes these bottlenecks, dramatically lowers costs, and operates in a 96-well plate format.
PAG 2024  |  2024

High-throughput HMW DNA animal blood extraction and sequencing on the PacBio Revio system

Deborah Moine1, Adam Bates3, Michelle Kim2, Jacob Brandenburg1, Nadia Sellami1, Jackson Mingle2, Jeffrey Burke2, Julian Rocha2, Aurelie Souppe1,Heather Ferrao1,Gregory Concepcion1, Caroline Howard3,and Kelvin J Liu2 1. PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, 2. PacBio, 701 E. Pratt Street, Baltimore, MD 21202, 3. Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK

The Darwin Tree of Life project is a large biodiversity initiative aiming to generate high quality genomes for 70,000 species of eukaryotes across Britain and Ireland. For large-scale projects such as these, high throughput (HT) solutions are critical; PacBio Nanobind HT DNA extraction kits combined with the Revio system address these needs by significantly increasing throughput and lowering the cost of long-read sequencing. We present fully automated methods for HT DNA extraction, shearing, library preparation, and PacBio HiFi sequencing of blood from animals with nucleated and non-nucleated red blood cells (nRBCs). These workflows can prepare 96 samples from DNA extraction to libraries that are ready for loading in ~10 hours. High molecular weight (HMW) DNA is extracted using the Nanobind disks on the Thermo Fisher KingFisher APEX or Hamilton NIMBUS Presto automated systems. Using these workflows, we can recover ~5−20 μg of dsDNA per extraction on a 96-well plate in 2.5 hours. The HMW DNA is size selected in a 96-well plate using the PacBio SRE kit and sheared to 15−20 kb by pipetting on an automated liquid handler. Last, libraries are prepared on the fully automated Hamilton NGS Star. The methods presented here utilize standard configurations of Hamilton instruments and can easily be incorporated into existing workflows. A single Revio SMRT Cell typically generates sufficient HiFi coverage for high-quality de novo assembly of a diploid vertebrate genome.
PAG 2024  |  2024

Increasing the throughput of full-length 16S sequencing with Kinnex kits

Jeremy E Wilkinson, Khi Pin Chua, Siyuan Zhang, Jason Underwood, Minning Chin, Wei-Shen Cheng, Sian Loong Au, Primo Baybayan, Holly Ganz, Guillaume Jospin, Ye Tao, Qin Lin, Elizabeth Tseng

In the past several years, the ability to capture the full-length (FL) 16S rRNA gene with PacBio HiFi sequencing has enabled researchers to profile microbiomes in significantly higher resolution. Only full-length and highly accurate 16S sequences can robustly identify the broad range of bacteria seen in complex microbial communities at the species level, without bias. To further increase the cost effectiveness of FL 16S sequencing, we applied the Kinnex 16S rRNA kit, which is based on the multiplexed array sequencing (MAS-Seq) method (Al’Khafaji et al., 2023), to FL 16S amplicons. The MAS-Seq method is a versatile throughput increase method that takes advantage of the longer HiFi read lengths to concatenate amplicons into ordered arrays with programmable array sizes. We demonstrated that Kinnex 16S results in an ~8–12-fold throughput increase compared to standard FL 16S. We tested the method on a diverse range (11 types) of samples including mock communities, human and animal feces/guts, soil, sediment, rhizosphere, sludge, and water. We then analyzed the data using a user-friendly bioinformatics pipeline, HiFi-16S-workflow, that provides a FASTQ-to-report analysis solution for FL 16S HiFi reads. Comparing the Kinnex 16S to standard FL 16S datasets, we found no bias in community compositions and were able to assign up to ~90–99% of denoised reads to species. In addition, on the highly complex ZymoBIOMICS Fecal Reference with TruMatrix Technology (D6323) sample, we found Kinnex 16S to have high correlation to taxonomic abundances estimated from shotgun metagenomics sequencing using the same sample, emphasizing that it’s possible to get shotgun metagenome taxonomic resolution at amplicon sequencing costs with FL 16S HiFi sequencing. Furthermore, with Kinnex 16S, researchers may now multiplex more samples to reduce cost per sample or to profile each sample deeper with more reads per sample.
PAG 2024  |  2024

New long-read metagenome assembly methods increase the number of high-quality MAGs from host-associated microbiomes

Daniel M. Portik, Jeremy E. Wilkinson

There are many challenges associated with metagenome assembly, which include: the presence of multiple species uneven and unknown species abundances conserved genomic regions shared across species strain-level variation within species PacBio HiFi sequencing produces highly accurate long reads (>Q20, >99% accuracy) which provide major advantages for metagenome assembly. New metagenome assembly algorithms have been developed specifically for HiFi reads, including hifiasm-meta1 and metaMDBG.2 These methods make it possible to reconstruct full metagenome-assembled genomes (MAGs) for many high abundance species. However, discontiguous assemblies will occur for lower abundance taxa. Post-assembly tools incorporating binning methods are required to identify and extract additional MAGs. The HiFi-MAG-Pipeline (v2) is a comprehensive workflow for processing long-read assemblies, and includes major steps such as binning, quality filtering, and taxonomic identification. Here, we demonstrate the performance of these methods using a variety of HiFi metagenomic datasets.
PAG 2024  |  2024

Scalable, cost-effective isoform sequencing with Kinnex full-length RNA kit using long-read sequencing

Jacob Brandenburg1, Elizabeth Tseng1, Heather Ferrao1, Jocelyne Bruand1, Armin Toepfer1, Gloria Sheynkman2, Madison Mehlferber2, Vasilli Pavelko2, 1. PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2. Dept of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville VA 22093

The Kinnex full-length RNA kit takes total RNA as input and outputs a sequencing-ready library that results in an 8-fold throughput increase compared to typical Iso-Seq libraries. Combined with the Iso-Seq analysis in SMRT Link software, PacBio offers costeffective isoform sequencing that does not require orthogonal sequencing methods.
ASHG 2023  |  2023

Genome-wide characterization of de novo tandem repeat mutations in the human genome

T. Mokveld1, E. Dolzhenko1, H. Dashnow2, B. van der Sanden3, B. Pedersen2, Z. Kronenberg1, T. Nicholas2, C. Fanslow1, C. Lambert1, N. Koundinya4, W. Harvey4, K. Hoekzema4, J. Knuth4, G. Garcia4, K. M. Munson4, B. Jadhav5, A. J. Sharp5, A.Tucci6, S. Watkins2, D. W. Neklason2, A. R. Quinlan2, C. Gilissen3, A. Hoischen3, E. E. Eichler4, M. A. Eberle1; 1) PacBio, Menlo Park, CA, 2) Univ. of Utah, Salt Lake City, UT, 3) Radboudumc, Nijmegen, Netherlands, 4) Univ. of Washington, Seattle, WA, 5) Icahn School of Medicine at Mount Sinai, New York, NY, 6) Genomics England, London, UK

TRs are implicated in Mendelian disease, cancer, and complex traits, and are a major source of structural variants. Standard approaches have limitations in effectively analyzing these regions. Utilizing PacBio HiFi sequencing data, TRGT1 and TRVZ1 were developed to: Estimate repeat lengths and mosaicism, analyze sequence composition, measure CpG methylation in repeats, support repeats up to 10Kb and visualize tandem repeats.
ASHG 2023  |  2023

Scalable, cost-effective isoform sequencing with Kinnex full-length RNA kit using long-read sequencing

Elizabeth Tseng1, Heather Ferrao1, Jocelyne Bruand1, Armin Toepfer1, Gloria Sheynkman2, Madison Mehlferber2, Vasilli Pavelko2 1) PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2) Dept. of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville VA 22093

The Kinnex full-length RNA kit takes total RNA as input and outputs a sequencing-ready library that results in an 8-fold throughput increase compared to typical Iso-Seq libraries. Combined with the Iso-Seq analysis in SMRT Link software, PacBio offers cost-effective isoform sequencing that does not require orthogonal sequencing methods.
ASHG 2023  |  2023

Building the spectrum of ground truth genetic variation in a four-generation 28-member CEPH family

Zev N Kronenberg1, Katherine M Munson2, David Porubsky2, Cillian Nolan1, William J Rowell1, Brent S Pedersen3, Cairbre Fanslow1, Primo Baybayan1, Nidhi Koundinya2, William Harvey2, Kendra Hoekzema2, Jordan Knuth2, Gage Garcia2, Tom Mokveld1, Egor Dolzhenko1, Scott Watkins3, Deborah W Neklason3, Aaron R Quinlan3, Lynn B Jorde3, Evan E Eichler2 and Michael A Eberle1

Highly accurate long-read sequencing characterizes the full spectrum of genetic variation across the genome, but variant calling software is still catching up to the sequencing technologies. To develop long-read methods for calling difficult variants and variants in the genomic dark regions, it is important to have a comprehensive ground truth dataset for benchmarking. Until now, most benchmarking datasets were primarily built using short-read technologies that are limited to the easily characterized parts of the genome. We are developing a comprehensive truth set by utilizing the power of genetic inheritance within a four- generation family (CEPH pedigree 1463 plus a newly collected fourth generation [Figure 1]) characterized with multiple sequencing technologies (PacBio, ONT, Illumina and Strand-seq) from blood-derived DNA. Large kinship pedigrees provide greater power to establish inheritance patterns when compared to trios; allowing us to adjudicate variant calls across the entire genome and not just in well behaved regions.
ASHG 2023  |  2023

High-throughput human sample prep and sequencing on PacBio Revio system

Deborah Moine1, Gregory Young1, Jeffrey Burke2, Julian Rocha2, Michelle Kim2, Primo Baybayan1, Ian McLaughlin1 and Kelvin J Liu2 1. PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025. 2. 701 E. Pratt Street Baltimore, MD 21202 USA

We present a fully automated HT DNA extraction, size-selection, shearing, and library preparation workflow for human whole blood and mammalian cell samples for PacBio HiFi sequencing.
ASHG 2023  |  2023

HiPhase: Jointly phasing small and structural variants from HiFi sequencing

J. Matthew Holt, Christopher T. Saunders, William J. Rowell, Zev Kronenberg, Aaron M. Wenger, and Michael Eberle PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025

We evaluated HiPhase with three HG002 replicates sequenced to ~30-fold coverage on the Revio system. Read-backed phasing allows all detected variants, including de novo variants, to be phased using read-level evidence. PacBio HiFi sequencing provides long, accurate observations that are ideal for phasing when researching both inherited and de novo variation.
Early Detection of Cancer - London  |  2023

Improved liquid biopsy assay performance using sequencing by binding (SBB)

Dan Nasko, Phillip Pham, Stuti Joshi, Kristi Kim, Nairi Pezeshkian, Young Kim, Alex Sockell, and Jonas Korlach

Liquid biopsy is revolutionizing the field of early cancer detection research through non-invasive detection of tumor DNA in the blood. However, existing liquid biopsy assays are limited in their sensitivity for ctDNA detection at low variant allele frequencies (VAFs), with most relying on extreme sequencing depth and computational error correction to separate the true ctDNA signal from background errors. This limitation is particularly problematic in the area of early cancer detection, in which expected ctDNA allele frequencies are extremely low. Novel strategies are therefore needed to help improve liquid biopsy assay sensitivity and reduce per-sample sequencing requirements. Here we describe PacBio’s application of the Onso short-read sequencing system to enable detection of ctDNA at low VAFs using the SeraCare Complete ctDNA Mutation Mix reference standard. The Onso system makes use of a novel sequencing by binding (SBB) method to achieve up to 15x greater quality scores, with ≥90% of reads at Q40 or above. We performed targeted capture and sequencing of libraries prepared from the SeraCare reference mix diluted into WT human DNA at the following VAFs: 0.00% (WT), 0.05%, 0.10%, 0.25%, and 0.50%, and compared the sensitivity at each VAF for SBB compared to a competitor method using sequencing by synthesis (SBS) at varying sequencing depths. We observed superior sensitivity for ctDNA detection using SBB compared to SBS at low VAFs (0.05%, 0.10%) at comparable sequencing depth. Furthermore, SBB required approximately four-fold less sequencing to achieve comparable sensitivity results to SBS. Finally, combining SBB with computational error-correction methods boosted sensitivity even further, suggesting an additive value for these technologies. Taken together, our results demonstrate the potential of SBB to improve upon existing methods of liquid biopsy and better enable research on early cancer detection.
Early Detection of Cancer - London  |  2023

Improved detection of low frequency mutations in ovarian and endometrial cancers by utilizing a highly accurate sequencing platform

Jiannis Ragoussis1 , Timothée Revil1 , Nairi Pezeshkian2, Lawrie Shabazian3 , Areej Al Hatib3 , Lucy Gilbert3,4,5 1) McGill University Genome Centre, Quebec, QC, Canada. 2) Pacific Biosciences, Menlo Park, CA,USA. 3) McGill Research Institute - Center for Innovative Medicine, Women's Health Research Unit, MUHC - Royal Victoria Hospital. 4) Department of Obstetrics & Gynecology, Department of Oncology, McGill University. 5 Gynecologic Cancer Service, Cedars Cancer Centre, McGill University Health Centre, Quebec, Canada

Ovarian and endometrial cancers are the 4th highest (combined) cancer killer of Canadian women. In 2020, over 3000 women were diagnosed with an ovarian cancer, of which 75% were in the later stages. The goal of the DOvEEgene (Detecting Ovarian and Endometrial cancer Early using Genomics) project is to detect these cancers as early as the first stage through a low-cost, low invasiveness and widely available test, similar to what the Pap test has done for cervical cancers. As these results of the algorithm are highly dependent on the quality of the variants detected, we were interested in testing the PacBio Onso sequencing by binding (SBB) technology which promises much higher sequencing qualities, thus should potentially increase specificity and sensitivity.
ASM 2023  |  2023

Increasing throughput of full-length 16S sequencing using concatenation

Jacob Brandenburg1, Khi Pin Chua1, Siyuan Zhang1, Jason Underwood1, Minning Chin1, Wei-Shen Cheng1, Sian Loong Au1, Primo Baybayan1, Holly Ganz2, Guillaume Jospin2, Ye Tao3, Qin Lin3, Elizabeth Tseng1, Jeremy E Wilkinson1 1. PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2. AnimalBiome, 400 29th St., Ste 101, Oakland, CA, USA 94609 3. Biozeron Biotechnology Co., Ltd., Shanghai, 201800 China

Comparing the concatenated (16S MAS-Seq) to non-concatenated full-length 16S datasets, we found no bias in community compositions and were able to assign up to ~90 – 99% of denoised reads to species. In addition, on the highly complex ZymoBIOMICS Fecal Reference with TruMatrix Technology (D6323) sample, we found 16S MAS-Seq to have high correlation to taxonomic abundances estimated from shotgun metagenomics sequencing using the same sample, emphasizing that it’s possible to get shotgun metagenome taxonomic resolution at amplicon sequencing costs with full-length 16S HiFi sequencing. Furthermore, with 16S MAS-Seq, researchers may now multiplex more samples to reduce cost/sample or to profile each sample deeper with more reads/sample.
Quick search

Quick search is faster but may return fewer results.

Advanced search

Advanced search allows you to search more fields but may take longer.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.