Menu
June 21, 2022  |  Sample + library preparation

High-throughput, high molecular weight DNA extraction, library preparation, and size selection for human whole genome sequencing

Rapid advances in long-read sequencing technology have led to dramatic improvements in cost and throughput. De novo sequencing projects using long-read sequencing to generate assemblies have grown from creating single reference genomes to examining population wide diversity across progressively larger groups of individuals, paving the way for a future of clinical human whole genome sequencing. Because of its high accuracy, Pacific Biosciences HiFi sequencing is already being used to study rare genetic conditions in pediatric cancer and to generate a human pan genome reference.
However, high throughput sample and library preparation methods are needed to support the generation of such large numbers of genomes. Such workflows are commonplace in short read sequencing but do not widely exist for long-read sequencing which requires manipulation of high molecular weight (HMW) DNA.
We present a high-throughput HMW DNA extraction, library preparation, and size selection workflow capable of processing 96 samples in as little as 8 hours for HiFi sequencing on Pacific Biosciences Sequel IIe systems.
Automated HMW DNA extraction is first performed using Nanobind magnetic disk technology on Thermo Fisher KingFisher instruments. In contrast to magnetic beads, each extraction uses a single Nanobind disk that is covered with micro- and nanostructured silica wrinkles to protect the bound DNA from shearing during automated processing. The method is capable of HMW DNA extraction from 96 cell, blood, and tissue samples in 60 – 180 minutes. Fully walkaway solutions are also available for use with Hamilton NIMBUS instruments.
Automated library preparation is then performed using PacBio Express Template 2.0 Kit on Hamilton NGS Star.
Finally, high throughput size selection is performed using a 96 well plate version of Short Read Eliminator technology. Short Read Eliminator technology uses a proprietary size selective precipitation chemistry to deplete small DNA from HMW DNA samples. Alternatively, a custom magnetic bead based size selection can be used to achieve high size selection cutoffs as well.
A variety of sample types, including cultured cells, bacteria, whole blood, and tissue are demonstrated. As these methods require only common commercial instrumentation, it can be easily integrated into many existing laboratory environments.


June 20, 2022  |  Epigenetics

Extracting CpG methylation from PacBio HiFi whole genome sequencing

PacBio HiFi sequencing has been demonstrated to provide the most accurate and complete characterization of human genomes, detecting single-nucleotide variants, indels, and structural variants with high precision and recall using a single sequencing experiment. Here we show that the same experiment also characterizes the epigenome, providing CpG methylation without additional sample prep or sequencing.

PacBio sequencing observes a polymerase in real time as it incorporates fluorescently labeled nucleotides to synthesize a DNA strand. The labels identify the nucleotide sequence. The kinetic signatures – pulse width (time of incorporation) and inter-pulse duration (time between adjacent incorporations) – correlate with chemical modifications to the canonical DNA bases. Some modifications, like N6 methyladenine, have strong kinetic signatures that allow accurate detection with single observations.  Others, like the 5-methylcytosine (5mC) modification present in human genomes, have a lower signal-to-noise ratio, which have necessitated high sequence coverage, specialized sample prep, or averaging over genomic regions to achieve high accuracy. PacBio HiFi sequencing, which provides multiple serial observations of the same molecule, opens new methodological approaches. We implemented a multilayer convolution neural network that combines kinetic measurements from the multiple passes and assumes symmetric methylation of CpG sites, which is typical in mammals.  We trained the model on fully unmethylated (whole-genome amplification) and fully methylated (M.SssI-treated) HiFi reads.  In that context, the network distinguishes methylated from unmethylated CpG in single molecules with an accuracy of around 80% at typical HiFi reads lengths (15 kb).

We further applied the approach on 3 samples from the Genome in a Bottle (GIAB) reference materials, for which CpG methylation is available from other approaches, including bisulfite sequencing, nanopore sequencing, and methylation microarrays.  The PacBio HiFi CpG methylation calls have high correlation with the orthogonal approaches.  By simultaneously providing accurate SNV calling and phasing, the HiFi reads reveal haplotype-specific methylation including parental imprinting.  This complete view of the genome and epigenome in a single experiment provides the opportunity to understand new aspects of evolution, disease, and diversity.


June 15, 2022  |  

High-throughput library preparation for HiFi whole genome sequencing

HiFi reads combine long read lengths (>10 kb) and high accuracy (99.9%), which has enabled the complete and accurate characterization of whole genomes, including the first telomere-to-telomere assembly of a human genome.  HiFi reads have simplified many bioinformatic analyses – efficient algorithms have been developed for read mapping, variant calling, and de novo genome assembly with HiFi reads.  But obstacles have remained in preparing libraries, namely high DNA input requirements (5 µg per human genome) and a multi-step workflow requiring about 12 hours. 

We have developed a HiFi library preparation workflow that reduces DNA input requirements, workflow steps, and end-to-end time.  Steps have been combined, reagents provided as master mixes, and reaction times optimized.  Size selection is performed with paramagnetic beads, which reduces workflow time and improves DNA recovery compared to size selection with gel cassettes.  In total, the new workflow – based on PacBio SMRTbell Prep Kit 3.0 – reduces 10 steps to 7 and decreases workflow time from 12 hours to 6 hours. 

We applied the workflow to generate libraries for a variety of samples including individual humans and multiplexed microbe strains.  All samples produced high-quality libraries that resulted in high HiFi sequencing yields and excellent analysis results.  The human sample HG002 achieved accurate variant calling (>99.9% SNV F1, >99.5% indel F1) and contiguous and accurate genome assembly (contig N50 >90 Mb, quality >Q50) at 32× coverage. All microbes assembled into complete genomes, with more than 375 Mb of total genome assembled from a single SMRT Cell.   

This improved workflow supports high-throughput generation of libraries for PacBio HiFi sequencing including using liquid handling robots, which is required for rare disease and population genetics projects that are applying HiFi sequencing at new scales. 


June 14, 2022  |  Infectious disease research

HiFiViral SARS-CoV-2: A long read kitted solution for genome surveillance

The COVID-19 pandemic continues to be a major global epidemiological challenge with the ongoing emergence of new strain lineages that are more contagious, more virulent, drug resistant and in some cases evade vaccine-induced immunity. In response, the HiFiViral SARS-CoV-2 kit (PacBio; Menlo Park, California) was developed as a scalable solution for the Sequel II and Sequel IIe systems. Unlike amplicon sequencing, the HiFiViral SARS-CoV-2 kit uses tiled probes, resulting in robust genome coverage across a broad range of RNA input quantities and in the presence of new variants. The use of long, accurate HiFi reads enables comprehensive variant detection, including single nucleotide variants, indels, and structural variants, as well as phasing of variants if multiple strains are present in samples. The fully kitted solution contains all reagents needed for viral enrichment and barcoding up to 384 samples. Flexible scaling allows the user to pool 24 – 384 samples into a SMRT bell library to be run on a single SMRT Cell 8M. For high throughput labs, up to 8 SMRT Cells may be loaded on an instrument with no subsequent touch points. SMRT Link analysis reports include variant calling, genome coverage, detection of samples with multiple strains, and a plate performance summary for assay prep QC. File outputs include consensus sequences ready for submission to surveillance databases, reads aligned to the Wuhan reference, and consensus sequences aligned to the reference for advanced users. Here we demonstrate performance across a broad range of sample Ct inputs in control samples and nasopharyngeal samples in samples batches up to 384. HiFiViral for SARS-CoV-2 is a cost effective, convenient, and accurate method for viral sequencing, well-suited for scalable surveillance of a rapidly evolving virus to inform public health decision making.


June 13, 2022  |  Microbiome

A new standard: High MAG recovery and precision species profiling of a pooled human gut microbiome reference using PacBio HiFi sequencing

Advancements in sequencing technologies have made metagenomic analyses of complex microbial samples routine and accessible. Mock communities of known composition are often run in parallel to allow for accurate data evaluation and to facilitate cross-study and inter-lab comparisons, yet they lack the microbial diversity of real-world samples. The ZymoBIOMICS Fecal Reference with TruMatrix Technology (D6323) is a highly diverse pooled human gut microbiome standard that provides a truly complex alternative to mock communities. However, the microbial content of this standard is only partially characterized, and species level composition remains underexplored. Here, we explore the content of this sample using highly accurate long-read sequencing. We generated 11.9 million HiFi reads (88.3 Gigabases) for this sample using PacBio HiFi sequencing on the Sequel IIe System. We performed taxonomic and functional profiling, as well as metagenome assembly, using analyses tailored to HiFi reads. With taxonomic profiling settings intended to optimize high precision and recall, we detected 155 species from 80 genera. With less stringent profiling settings with no filtering, we detected as many as ~7,200 total species. We found 92% of HiFi reads were assigned a functional annotation, with an average of 2–4 annotations per read. This resulted in over 66.9 million total functional annotations representing over 17,000 unique classes. We used hifiasm-meta to perform metagenome assembly and a PacBio binning pipeline to identify and characterize high-quality metagenome assembled genomes (MAGs). This workflow produced ~2600 genome bins and identified 199 high-quality MAGs (>70% complete, <10% contamination, <20 contigs). Of these, 102 MAGs were >95% complete and included 54 MAGs composed of a single, circular contig. Finally, we downsampled our data to simulate several multiplexing schemes and investigated the impact on these analyses. Species detection and functional profiling results were largely robust across data levels, whereas we observed a predictable decrease in MAG recovery with decreasing data. The species-level relative abundance profiles and highly complete MAGs generated in our study helps shed light on the diverse content of this novel metagenomic control and the use of PacBio HiFi sequencing for generating high-quality metagenomic data. Overall, this newly developed reference sample should prove useful in assessing sequencing results and consistency (from cross-platform technologies and wet lab methods), aid in the development of new bioinformatics approaches, and improve methodological benchmarking studies.


June 13, 2022  |  Microbial sequencing methods

Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets

Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for a variety of metagenomics analyses, including taxonomic profiling. The development of long-read specific tools for taxonomic profiling is accelerating, yet there is a lack of consensus regarding their relative performance. Here, we perform a critical benchmarking study using five long-read methods and four popular short-read methods. We applied these tools to several mock community datasets generated using Pacific Biosciences (PacBio) HiFi or Oxford Nanopore Technology (ONT) sequencing, and evaluated their performance based on read utilization, detection metrics, and relative abundance estimates. Our results show that long-read methods generally outperformed short-read methods. Short-read methods (including Kraken2, Bracken, Centrifuge, and MetaPhlAn3) produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, several long-read methods displayed very high precision and acceptable recall without any filtering required, including BugSeq, MEGAN-LR using translation alignments (DIAMOND to NCBI nr) or nucleotide alignments (minimap2 to NCBI nt). Furthermore, in the PacBio HiFi datasets these long-read methods detected all species down to the 0.1% abundance level with high precision. Other long-read methods, such as MetaMaps and MMseqs2, required moderate filtering to reduce false positives to achieve a suitable balance between precision and recall. We found read quality affected performance for methods relying on protein prediction or exact kmer matching, and these methods performed better with PacBio HiFi datasets. Finally, for a given mock community we found that the long-read datasets produced significantly better results than short-read datasets, demonstrating clear advantages for long-read metagenomic sequencing. Our critical assessment of available methods provides recommendations for current research using long reads and establishes a baseline for future benchmarking studies.


June 1, 2022  |  Infectious disease research

A Streamlined Workflow for High-Throughput, Multiplexed HiFi Sequencing of Microbial Genomes

The SARS-CoV-2 global pandemic has highlighted the utility of pathogen surveillance pipelines that provide comprehensive genomic information, giving public health scientists a more complete view of the spread and characteristics of circulating pathogens. Beyond COVID-19, there is great interest in public health to expand high resolution surveillance to other infectious diseases. Highly accurate, long HiFi reads produced by the PacBio Sequel IIe System have brought new levels of contiguity, completeness, and accuracy to large genome assembly. HiFi reads are similarly beneficial for microbial genome assembly, as higher quality assemblies enhance our ability to investigate foodborne illnesses and monitor antimicrobial resistance. However, obstacles in library preparation workflow, cost, and recovery of small plasmids have limited use in public health. Here, we introduce a new library prep workflow and assembly algorithm based on HiFi reads that enables a high throughput, end-to-end solution for microbial genome assembly. The new workflow combines steps, eliminates the need for strict size selection, shortens the total time to 6 hours, and enables library prep automation. The assembly algorithm uses strict read-to-read overlaps enabled by HiFi read accuracy to resolve repeats. It uses a two-stage approach to first assemble chromosomes and then recover short, high-coverage plasmids. To evaluate the method, a pool of HiFi libraries with 96 microbial samples and total genome size of 375 Mb was generated. The protocol was evaluated with microbes relevant to pathogen surveillance including common foodborne pathogens (Listeria, Salmonella) and species often seen in hospital settings (Klebsiella, Staphylococcus). The microbes represent a range of genome sizes, assembly complexity, GC content, chromosome counts, and plasmid content. DNA samples were sheared to 7 kb – 10 kb, prepared as barcoded libraries, pooled, and sequenced on one SMRT Cell 8M on the Sequel IIe System. Reference quality de novo microbial assemblies with 5 contigs or fewer were achieved for all samples. Typical chromosome assembly quality was Q50, measured as concordance to reference assemblies. Nearly all plasmids were recovered, including those shorter than 5 kb which are often lost in workflows with strict size selection. Taken together, the new method provides a high-throughput, cost-effective approach suitable to routinely generate reference quality microbial genomes in a public health environment as part of a pathogen surveillance program.


May 18, 2022  |  Infectious disease research

A Streamlined Workflow For High-Throughput, Multiplexed HiFi Sequencing Of Microbial Genomes

The SARS-CoV-2 global pandemic has highlighted the utility of pathogen surveillance pipelines that provide comprehensive genomic information, giving public health scientists a more complete view of the spread and characteristics of circulating pathogens. Beyond COVID-19, there is great interest in public health to expand high resolution surveillance to other infectious diseases.

Highly accurate, long HiFi reads produced by the PacBio Sequel IIe System have brought new levels of contiguity, completeness, and accuracy to large genome assembly. HiFi reads are similarly beneficial for microbial genome assembly, as higher quality assemblies enhance our ability to investigate foodborne illnesses and monitor antimicrobial resistance. However, obstacles in library preparation workflow, cost, and recovery of small plasmids have limited use in public health. Here, we introduce a new library prep workflow and assembly algorithm based on HiFi reads that enables a high throughput, end-to-end solution for microbial genome assembly. The new workflow combines steps, eliminates the need for strict size selection, shortens the total time to 6 hours, and enables library prep automation. The assembly algorithm uses strict read-to-read overlaps enabled by HiFi read accuracy to resolve repeats.  It uses a two-stage approach to first assemble chromosomes and then recover short, high-coverage plasmids.

To evaluate the method, a pool of HiFi libraries with 96 microbial samples and total genome size of 375 Mb was generated.  The protocol was evaluated with microbes relevant to pathogen surveillance including common foodborne pathogens (Listeria, Salmonella) and species often seen in hospital settings (Klebsiella, Staphylococcus).  The microbes represent a range of genome sizes, assembly complexity, GC content, chromosome counts, and plasmid content.  DNA samples were sheared to 7 kb – 10 kb, prepared as barcoded libraries, pooled, and sequenced on one SMRT Cell 8M on the Sequel IIe System. 

Reference quality de novo microbial assemblies with 5 contigs or fewer were achieved for all samples.  Typical chromosome assembly quality was Q50, measured as concordance to reference assemblies.  Nearly all plasmids were recovered, including those shorter than 5 kb which are often lost in workflows with strict size selection. Taken together, the new method provides a high-throughput, cost-effective approach suitable to routinely generate reference quality microbial genomes in a public health environment as part of a pathogen surveillance program.


May 18, 2022  |  Infectious disease research

HiFiViral SARS-CoV-2: A Mutation Tolerant, Fully-kitted Solution for COVID-19 Surveillance

The COVID-19 pandemic is an ongoing global challenge, with the repeated emergence of new variants that are more contagious, more virulent, drug resistant or evade vaccine-induced immunity. In response, the HiFiViral SARS-CoV-2 kit was developed as a scalable solution with increased resilience against virus mutations, designed for use on the Sequel IIe system. Unlike PCR-based amplicon methods, the HiFiViral SARS-CoV-2 kit relies on ~1,000, densely tiled Molecular Inversion Probes (MIPs) such that every genomic position is covered by ~22 probes, resulting in robust genome coverage of all circulating variants, including the mutation-dense Omicron lineage, across a broad range of Ct values.
The HiFiViral kit offers many benefits for high-throughput surveillance. Sequencing 675 bp fragments with highly accurate HiFi reads enables comprehensive variant detection, including single nucleotide variants, indels, structural variants, and identification of multi-strain samples. The kit is scalable, containing all reagents needed to enrich and barcode 384 samples, in batches ranging from 24 – 384, for sequencing in one SMRT bell library. For high throughput labs, up to 8 SMRT Cells may be loaded on an instrument with no subsequent touch points, for up to 3,072 samples per run. The enrichment assay is also simpler to execute than PCR-based assays, consisting of just 4 add-only, color-change indicated steps. Barcoding primers come pre-mixed in a 384-well, resealable plate. The simple assay design uses fewer plastics, limiting the impact of supply chain issues. In addition, methods for running the HiFiViral kit were established on a mosquito® HV Genomics pipetting robot and assessed for their performance. Finally, SMRT Link data analysis is one touch, and reports include variant calling, genome coverage, multi-strain detection, and a plate performance summary. File outputs include consensus sequences ready for database submission and reads and consensus sequences aligned to the Wuhan reference.
In this study, we demonstrate consistent recovery of >95% complete SARS-CoV-2 genomes using the commercially available HiFiViral kit at 4 different sites performing routine genomic surveillance. The evaluation runs included a broad range of sample Ct inputs across control and nasopharyngeal samples in batches up to 384. The runs also demonstrated consistent performance against Alpha, Delta, Omicron and other variant lineages, without the need for probe updates. In summary, the HiFiViral for SARS-CoV-2 kit is a cost-effective, convenient, accurate method for viral sequencing and well-suited for scalable surveillance of a rapidly evolving virus to inform public health decision making.

1PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025
2SPT Labtech, Ltd., Melbourn Science Park ,Melbourn, Hertfordshire SG8 6HB


October 29, 2021  |  

Targeting Clinically Significant Dark Regions of the Human Genome with High-Accuracy, Long-Read Sequencing

There are many clinically important genes in “dark” regions of the human genome. These regions are characterized as dark due to a paucity of NGS coverage as a result of short-read sequencing or mapping difficulties. Low NGS sequencing yield can arise in these regions due to the presence of various repeat elements or biased base composition while inaccurate mapping can result from segmental duplications. Long-read sequencing coupled with an optimized, robust enrichment method has the potential to illuminate these dark regions. 


October 29, 2021  |  

Resolving Complex Pathogenic Alleles using HiFi Long-Range Amplicon Data and a New Clustering Algorithm

Many genetic diseases are mapped to structurally complex loci. These regions contain highly similar paralogous alleles (>99% identity) that span kilobases within the human genome. Comprehensive screening for pathogenic variants is incomplete and labor intensive using short-reads or optical mapping. In contrast, long-range amplification and PacBio HiFi sequencing fully and directly resolve and phase a wide range of pathogenic variants without inference. To capitalize on the accuracy of HiFi data we designed a new amplicon analysis tool, pbAA. pbAA can rapidly deconvolve a mixture of haplotypes, enabling precise diplotyping, and disease allele classification. 


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.