Sprite decoration

Scientific posters

IHMC 2022  |  2022

Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets

Portik, Daniel M. and Wilkinson, Jeremy E and Brown, C. Titus and Pierce-Ward, N. Tessa

Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for taxonomic profiling, where the main goal is to identify the species present in a microbiome sample (typically bacteria, archaea, fungi, viruses) and their relative abundances. The development of long-read specific tools for taxonomic profiling is accelerating, yet there is a lack of consensus regarding their relative performance. We performed a critical benchmarking study using five long-read methods and four popular short-read methods1. We applied these tools to several mock community datasets generated using PacBio HiFi sequencing or Oxford Nanopore Technology (ONT) sequencing, and Illumina data.
IHMC 2022  |  2022

Maximizing MAGs from long-read metagenomic assemblies: a new post-assembly pipeline with circular-aware binning

Wilkinson, Jeremy E. and Portik, Daniel M.

There are many challenges involved with metagenome assembly, including the presence of multiple species, uneven species abundances, and conserved genomic regions that are shared across species. Highly accurate long reads offer clear advantages over short reads and can overcome many of the obstacles associated with metagenome assembly. PacBio HiFi sequencing of metagenomic samples with the Sequel IIe system regularly produces reads 8–15 kb in size with a median QV ranging from 30 – 45 (99.9–99.99% accuracy). With the development of new metagenome assembly algorithms specific to HiFi reads (hifiasm-meta, metaFlye), it is now possible to reconstruct full metagenome assembled genomes (MAGs) for many high abundance species. These MAGs are often composed of a single circular contig, representing high-quality complete bacterial genomes. However, discontiguous assemblies still occur for lower abundance taxa, and post-assembly tools are required to identify MAGs in this category. Here, we present the HiFi-MAG-Pipeline, a comprehensive workflow for processing long-read metagenome assemblies.
CSHL Microbiome 2022  |  2022

Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets

Portik, Daniel M. and Brown, C. Titus and Pierce-Ward, N. Tessa

Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for taxonomic profiling, where the main goal is to identify the species present in a microbiome sample (typically bacteria, archaea, fungi, viruses) and their relative abundances. The development of long-read specific tools for taxonomic profiling is accelerating, yet there is a lack of consensus regarding their relative performance. We performed a critical benchmarking study using five long-read methods and four popular short-read methods. We applied these tools to several mock community datasets generated using PacBio HiFi sequencing or Oxford Nanopore Technology (ONT) sequencing, and Illumina data.
CSHL Microbiome 2022  |  2022

High MAG recovery and precision species profiling of a pooled human fecal reference using PacBio HiFi sequencing

Wilkinson, Jeremy E. and Ashby, Meredith and Zhang, Siyuan and Locken, Kris and Tang, Shuiquan and Farthing, Brett and Weinstein, Michael and Carlin, Martha and Cano, Raul and Langford, Kyle and Auch, Benjamin and Liachko, Ivan and Portik, Daniel M.

Advancements in sequencing technologies have made metagenomic analyses of complex microbial samples routine and accessible. Mock communities of known composition are often run in parallel to allow for accurate data evaluation and to facilitate cross-study and inter-lab comparisons, yet they lack the microbial diversity of real-world samples. The ZymoBIOMICS Fecal Reference with TruMatrix Technology (D6323) is a highly diverse pooled human fecal reference that provides a truly complex alternative to mock communities. However, the microbial content of this standard is only partially characterized, and species level composition remains underexplored. Here, we explore the content of this sample using highly accurate long-read sequencing.
ASM NGS 2022  |  2022

Maximizing MAGs from long-read metagenomic assemblies – a new post-assembly pipeline with circular binning

Portik, Daniel M. and Wilkinson, Jeremy E.

There are many challenges involved with metagenome assembly, including the presence of multiple species, uneven species abundances, and conserved genomic regions that are shared across species. Highly accurate long reads offer clear advantages over short reads and can overcome many of the obstacles associated with metagenome assembly. PacBio HiFi sequencing of metagenomic samples with the Sequel IIe system regularly produces reads 8–15 kb in size with a median QV ranging from 30–45 (99.9–99.99% accuracy). With the development of new metagenome assembly algorithms specific to HiFi reads (hifiasm-meta, metaFlye), it is now possible to reconstruct full metagenome assembled genomes (MAGs) for many high abundance species. These MAGs are often composed of a single circular contig, representing high-quality complete bacterial genomes. However, discontiguous assemblies still occur for lower abundance taxa, and post-assembly tools are required to identify MAGs in this category. Here, we present the HiFi-MAG-Pipeline, a comprehensive workflow for processing long-read metagenome assemblies.
LAMG 2022  |  2022

Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets

Portik, Daniel M. and Brown, C. Titus and Pierce-Ward, N. Tessa

Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for taxonomic profiling, where the main goal is to identify the species present in a microbiome sample (typically bacteria, archaea, fungi, viruses) and their relative abundances. The development of long-read specific tools for taxonomic profiling is accelerating, yet there is a lack of consensus regarding their relative performance. We performed a critical benchmarking study using five long-read methods and four popular short-read methods. We applied these tools to several mock community datasets generated using PacBio HiFi sequencing or Oxford Nanopore Technology (ONT) sequencing, and Illumina data.
ESHG  |  2022

Enablement of long-read targeted sequencing panels using Twist hybrid capture and PacBio HiFi sequencing

Sarah Kingan1, John Harting1, Ting Hon1, Yu-Chih Tsai1, Ian McLaughlin1, Janet Ziegle1, Tina Han2, Leonardo Arbiza2, Susan Kloet3, Loes Busscher3, Geoff Henno1, Edd Lee1, Nina Gonzaludo1

Targeted resequencing allows for high-resolution characterization of gene panels at a scale and cost that is more accessible than whole genome sequencing. While long-read PacBio HiFi sequencing has been shown to accurately and comprehensively interrogate complex clinically actionable loci, such as pharmacogenomic targets, studies have been primarily focused on single genes using PCR amplicon- based methods. We describe a method to leverage Twist Bioscience target enrichment probes for the design of custom gene panels sequenced with HiFi reads.
ESHG  |  2022

Genome-wide CpG methylation calling with standard HiFi whole genome sequencing

Justin Blethrow1, Daniel Portik1, Kristofor Nyquist1, Aaron Wenger1, Richard Hall1

Both the genome and epigenome contribute to inherited disease. While genome sequencing has been applied at large scale, epigenome sequencing remains more difficult and expensive and less frequently used. Here we extend PacBio HiFi sequencing to simultaneously generate accurate genomes and epigenomes with a single library prep and sequencing experiment.
ESHG  |  2022

Integrated heteroduplex correction in PacBio’s circular consensus algorithm

Derek Barnett1, John Harting1, Walter Lee1, Armin Töpfer1, Fritz Sedlazeck2, Jenny Ekholm1, Nina Gonzaludo1, Justin Blethrow1, James Drake1, Zev Kronenberg1

A heteroduplex is a double-stranded sequence comprised of two non-complementary strands that can form during PCR. These mixed-template artifacts produce misleading results in downstream analysis, e.g., false haplotypes during diplotyping. Unlike short-read technologies, PacBio Single- Molecule Real-Time sequencing produces strand-level base calls. Heteroduplex signatures can be directly observed and corrected using the stranded sub-read data. Our new method is integrated in the circular consensus sequence algorithm which generates accurate HiFi data from sub-reads.
ESHG  |  2022

Typing CYP2D6 star alleles from fully phased variants using PacBio HiFi reads

John Harting1, Zev Kronenberg2, Nina Gonzaludo3, Jenny Ekholm4, Geoff Henno5, Edd Lee6

The CYP2D6 locus is well known for its importance to pharmacogenetics as well as for its high diversity and complex genomic setting. Resolving individual alleles at this locus using short-read sequencing technologies requires inference-based methods due to ambiguous mapping in the presence of highly homologous pseudogenes. In contrast, long-range sequencing with PacBio HiFi reads directly resolves and phases a wide range of complicated and difficult genetic loci without inference. We present a novel bioinformatics workflow using PacBio HiFi reads which enables rapid and precise diplotyping and star(*)-allele classification of CYP2D6.
ASM  |  2022

A new standard: High MAG recovery and precision species profiling of a pooled human gut microbiome reference using PacBio HiFi sequencing

Portik, Daniel, and Ashby, Meredith and Zhang,Siyuan and Locken, Kris and Tang, Shuiquan and Farthing, Brett and Weinsten, Michael and Carlin, Martha and Cano, Raul and Wilkinson, Jeremy

Advancements in sequencing technologies have made metagenomic analyses of complex microbial samples routine and accessible. Mock communities of known composition are often run in parallel to allow for accurate data evaluation and to facilitate cross-study and inter-lab comparisons, yet they lack the microbial diversity of real-world samples. The ZymoBIOMICS Fecal Reference with TruMatrix Technology (D6323) is a highly diverse pooled human gut microbiome standard that provides a truly complex alternative to mock communities. However, the microbial content of this standard is only partially characterized, and species level composition remains underexplored. Here, we explore the content of this sample using highly accurate long-read sequencing. We generated 11.9 million HiFi reads (88.3 Gigabases) for this sample using PacBio HiFi sequencing on the Sequel IIe System. We performed taxonomic and functional profiling, as well as metagenome assembly, using analyses tailored to HiFi reads. With taxonomic profiling settings intended to optimize high precision and recall, we detected 155 species from 80 genera. With less stringent profiling settings with no filtering, we detected as many as ~7,200 total species. We found 92% of HiFi reads were assigned a functional annotation, with an average of 2–4 annotations per read. This resulted in over 66.9 million total functional annotations representing over 17,000 unique classes. We used hifiasm-meta to perform metagenome assembly and a PacBio binning pipeline to identify and characterize high-quality metagenome assembled genomes (MAGs). This workflow produced ~2600 genome bins and identified 199 high-quality MAGs (>70% complete, <10% contamination, <20 contigs). Of these, 102 MAGs were >95% complete and included 54 MAGs composed of a single, circular contig. Finally, we downsampled our data to simulate several multiplexing schemes and investigated the impact on these analyses. Species detection and functional profiling results were largely robust across data levels, whereas we observed a predictable decrease in MAG recovery with decreasing data. The species-level relative abundance profiles and highly complete MAGs generated in our study helps shed light on the diverse content of this novel metagenomic control and the use of PacBio HiFi sequencing for generating high-quality metagenomic data. Overall, this newly developed reference sample should prove useful in assessing sequencing results and consistency (from cross-platform technologies and wet lab methods), aid in the development of new bioinformatics approaches, and improve methodological benchmarking studies.
ASM  |  2022

Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets

Portik, Daniel M.and Brown, C. Titus and Pierce-Ward, N. Tessa

Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for a variety of metagenomics analyses, including taxonomic profiling. The development of long-read specific tools for taxonomic profiling is accelerating, yet there is a lack of consensus regarding their relative performance. Here, we perform a critical benchmarking study using five long-read methods and four popular short-read methods. We applied these tools to several mock community datasets generated using Pacific Biosciences (PacBio) HiFi or Oxford Nanopore Technology (ONT) sequencing, and evaluated their performance based on read utilization, detection metrics, and relative abundance estimates. Our results show that long-read methods generally outperformed short-read methods. Short-read methods (including Kraken2, Bracken, Centrifuge, and MetaPhlAn3) produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, several long-read methods displayed very high precision and acceptable recall without any filtering required, including BugSeq, MEGAN-LR using translation alignments (DIAMOND to NCBI nr) or nucleotide alignments (minimap2 to NCBI nt). Furthermore, in the PacBio HiFi datasets these long-read methods detected all species down to the 0.1% abundance level with high precision. Other long-read methods, such as MetaMaps and MMseqs2, required moderate filtering to reduce false positives to achieve a suitable balance between precision and recall. We found read quality affected performance for methods relying on protein prediction or exact kmer matching, and these methods performed better with PacBio HiFi datasets. Finally, for a given mock community we found that the long-read datasets produced significantly better results than short-read datasets, demonstrating clear advantages for long-read metagenomic sequencing. Our critical assessment of available methods provides recommendations for current research using long reads and establishes a baseline for future benchmarking studies.
AGBT 2022  |  2022

Extracting CpG methylation from PacBio HiFi whole genome sequencing

Hall, Richard and Portik, Daniel and Nyquist, Kristofor and Wenger, Aaron

PacBio HiFi sequencing has been demonstrated to provide the most accurate and complete characterization of human genomes, detecting single-nucleotide variants, indels, and structural variants with high precision and recall using a single sequencing experiment. Here we show that the same experiment also characterizes the epigenome, providing CpG methylation without additional sample prep or sequencing. PacBio sequencing observes a polymerase in real time as it incorporates fluorescently labeled nucleotides to synthesize a DNA strand. The labels identify the nucleotide sequence. The kinetic signatures – pulse width (time of incorporation) and inter-pulse duration (time between adjacent incorporations) – correlate with chemical modifications to the canonical DNA bases. Some modifications, like N6 methyladenine, have strong kinetic signatures that allow accurate detection with single observations. Others, like the 5-methylcytosine (5mC) modification present in human genomes, have a lower signal-to-noise ratio, which have necessitated high sequence coverage, specialized sample prep, or averaging over genomic regions to achieve high accuracy. PacBio HiFi sequencing, which provides multiple serial observations of the same molecule, opens new methodological approaches. We implemented a multilayer convolution neural network that combines kinetic measurements from the multiple passes and assumes symmetric methylation of CpG sites, which is typical in mammals. We trained the model on fully unmethylated (whole-genome amplification) and fully methylated (M.SssI-treated) HiFi reads. In that context, the network distinguishes methylated from unmethylated CpG in single molecules with an accuracy of around 80% at typical HiFi reads lengths (15 kb). We further applied the approach on 3 samples from the Genome in a Bottle (GIAB) reference materials, for which CpG methylation is available from other approaches, including bisulfite sequencing, nanopore sequencing, and methylation microarrays. The PacBio HiFi CpG methylation calls have high correlation with the orthogonal approaches. By simultaneously providing accurate SNV calling and phasing, the HiFi reads reveal haplotype-specific methylation including parental imprinting. This complete view of the genome and epigenome in a single experiment provides the opportunity to understand new aspects of evolution, disease, and diversity.
AGBT 2022  |  2022

HiFiViral SARS-CoV-2: A long read kitted solution for genome surveillance

Hon, Ting and Wilson, Joan and McLaughlin, Ian and Kronenberg, Zeh and Harting, John and Ashby, Meredith and Dahlen, Trang and Zeigle, Janet and Kingan, Sarah

The COVID-19 pandemic continues to be a major global epidemiological challenge with the ongoing emergence of new strain lineages that are more contagious, more virulent, drug resistant and in some cases evade vaccine-induced immunity. In response, the HiFiViral SARS-CoV-2 kit (PacBio; Menlo Park, California) was developed as a scalable solution for the Sequel II and Sequel IIe systems. Unlike amplicon sequencing, the HiFiViral SARS-CoV-2 kit uses tiled probes, resulting in robust genome coverage across a broad range of RNA input quantities and in the presence of new variants. The use of long, accurate HiFi reads enables comprehensive variant detection, including single nucleotide variants, indels, and structural variants, as well as phasing of variants if multiple strains are present in samples. The fully kitted solution contains all reagents needed for viral enrichment and barcoding up to 384 samples. Flexible scaling allows the user to pool 24 - 384 samples into a SMRT bell library to be run on a single SMRT Cell 8M. For high throughput labs, up to 8 SMRT Cells may be loaded on an instrument with no subsequent touch points. SMRT Link analysis reports include variant calling, genome coverage, detection of samples with multiple strains, and a plate performance summary for assay prep QC. File outputs include consensus sequences ready for submission to surveillance databases, reads aligned to the Wuhan reference, and consensus sequences aligned to the reference for advanced users. Here we demonstrate performance across a broad range of sample Ct inputs in control samples and nasopharyngeal samples in samples batches up to 384. HiFiViral for SARS-CoV-2 is a cost effective, convenient, and accurate method for viral sequencing, well-suited for scalable surveillance of a rapidly evolving virus to inform public health decision making.
AGBT 2022  |  2022

High-throughput library preparation for HiFi whole genome sequencing

Wenger, Aaron and Ferrao, Heather and Lambert, Christine and Liu, Vicky and Wang, Lin and Wilson, Joan and Chakraborty, Shreya and Lee, Davy and Souppe, Aurelie and Rowell, William and Ziegle, Janet and Miller, David and Baybayan, Primo and Dee, Suzanne

HiFi reads combine long read lengths (>10 kb) and high accuracy (99.9%), which has enabled the complete and accurate characterization of whole genomes, including the first telomere-to-telomere assembly of a human genome. HiFi reads have simplified many bioinformatic analyses – efficient algorithms have been developed for read mapping, variant calling, and de novo genome assembly with HiFi reads. But obstacles have remained in preparing libraries, namely high DNA input requirements (5 µg per human genome) and a multi-step workflow requiring about 12 hours. We have developed a HiFi library preparation workflow that reduces DNA input requirements, workflow steps, and end-to-end time. Steps have been combined, reagents provided as master mixes, and reaction times optimized. Size selection is performed with paramagnetic beads, which reduces workflow time and improves DNA recovery compared to size selection with gel cassettes. In total, the new workflow – based on PacBio SMRTbell Prep Kit 3.0 – reduces 10 steps to 7 and decreases workflow time from 12 hours to 6 hours. We applied the workflow to generate libraries for a variety of samples including individual humans and multiplexed microbe strains. All samples produced high-quality libraries that resulted in high HiFi sequencing yields and excellent analysis results. The human sample HG002 achieved accurate variant calling (>99.9% SNV F1, >99.5% indel F1) and contiguous and accurate genome assembly (contig N50 >90 Mb, quality >Q50) at 32× coverage. All microbes assembled into complete genomes, with more than 375 Mb of total genome assembled from a single SMRT Cell. This improved workflow supports high-throughput generation of libraries for PacBio HiFi sequencing including using liquid handling robots, which is required for rare disease and population genetics projects that are applying HiFi sequencing at new scales.
Quick search

Quick search is faster but may return fewer results.

Advanced search

Advanced search allows you to search more fields but may take longer.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.