As more researchers embrace the benefits of PacBio long-read sequencing technology, an expanding community of analysis tool developers has taken shape. Thanks to this growing excitement around PacBio capabilities there are now many excellent workflows available for creating accurate genome assemblies using HiFi reads. Methods that combine HiFi and Hi-C to produce quality diploid assemblies with chromosome-length phasing are becoming increasingly popular. At present, powerful tools like hifiasm and Verkko are among the most widely used of such assembly methods. These two bioinformatic approaches are well tested and are favored by innovative organizations such the Human Pangenome Reference Consortium (HPRC).
Beyond HPRC, the community support and development of PacBio-compatible analysis tools – with various applications and capabilities– is accelerating. Shilpa Garg, PhD is one such community contributor who recently presented on her related methods, DipAsm and pstools, in a webinar.
With DipAsm and pstools Dr. Garg and collaborators hope to supply a streamlined chromosome-level phasing technique based on PacBio HiFi sequencing. Watch the talk and explore this article for more on these bioinformatics approaches and how they can help drive discovery.
An era of chromosome-scale genomics enabled by HiFi long-read sequencing
The excellent quality of PacBio long-read data combined with the topographical information provided by Hi-C techniques has enabled researchers to create haplotype-resolved genomes at the chromosome-level. Such approaches are so effective that they have become standard practice for some of the world’s most foremost genomics consortia and reference repositories. To date, examples of such PacBio users includes the T2T Consortium, The Human Pangenome Reference Consortium (HPRC), and Vertebrate Genomes Project among others.
As part of their vetting process, the HPRC compared methods for creating the most cost-effective, complete, correct, haplotype-resolved diploid genome assemblies and found that PacBio HiFi reads gave the best results. Furthermore, the consortium used this PacBio HiFi approach to generate the first high-quality diploid human pangenome reference with only four gaps on average per chromosome (for comparison, reference genome GRCh38 – which was built with BAC clones, Sanger sequencing, and short-read sequencing – had gaps spanning 120 Mb pairs). The HPRC wants to build on this milestone by sequencing 350 human genomes, all based on haplotype-resolved assemblies made with HiFi reads. With the increasing adoption of HiFi sequencing in population genomics and reference genome assembly, there is a significant opportunity in creating tools for interpreting such high-quality data faster and more cost-effectively.
Generate precise assemblies with chromosome-level phasing using DipAsm, a HiFi + Hi-C bioinformatics workflow
In the past, to construct reference genomes, researchers had to use a combination of BAC (Bacterial Artificial Chromosome) libraries, short reads, and Sanger sequencing. Even when combined, these technologies were not enough to span the vast repetitive regions of the genome found in humans leading to significant gaps in the reference. Thankfully, the length and accuracy of PacBio HiFi long-read sequencing technology has made it vastly easier for scientists to produce assemblies with far fewer gaps for a variety of applications. With HiFi reads now available at scale with the launch of the Revio system, moving through the genome assembly and phasing process with speed and precision has become an exciting area for further innovation.
Many new software tools have been created to enable more accurate assemblies, but many are still limited to specific applications. Dr. Shilpa Garg and colleagues looked to address these limitations by creating DipAsm a workflow that combines HiFi reads and Hi-C data to produce more exact chromosome-level haplotype-resolved assemblies very quickly.
This tool takes HiFi data as an input to produce continuous sequences that are then scaffolded and phased with Hi-C to produce even longer sequences. Hi-C is used to link heterozygous single nucleotide polymorphisms (SNPs) over long distances, partition HiFi reads by haplotype, and assemble each partition separately to get phased sequences.
The standout benefit of this workflow is that it produces chromosome-level haplotype-resolved assemblies within a day, which previously took weeks. Dr. Garg and her team have tested DipAsm on widely used genomes (HG002, NA12878, and PGP1) and found that the approach produces results comparable to alternatives at excellent speed.
Explore the structural variant landscape in haplotype-resolved genomics using pstools
Building on the success and capabilities of DipAsm, Dr. Garg’s pstools enables researchers to study the landscape of structural variation in all types of samples -including cancer genomes- in more detail. Using this tool researchers can now create a precise snapshot of the structural variation landscape between cell lines or species, even in the repeat elements, which were traditionally missed by many approaches.
The pstools workflow has also been used to look at clinically relevant regions such as HLA (human leukocyte antigen) and KIR (killer cell immunoglobulin-like receptor), which have an elevated level of divergence from the reference genome. Results from Dr. Garg’s work have suggested that there was a 10% divergence from the reference genome in these regions, underscoring this method’s ability to capture information that had been previously missed in reference-based analysis.
Looking toward the future
Workflows such as DipAsm and pstools combine the length and accuracy of PacBio HiFi data along with Hi-C techniques to supply a comprehensive representation of chromosome-scale haplotypes in human genomics as well as in pangenome studies. Together these methods offer a promising approach to uncover meaningful genomic insights in fields as varied as medicine, biodiversity, and agrigenomics. In this new era of highly accurate long reads and rich analysis options, the most important question is…
What will you discover?
Ready to try PacBio long-read sequencing?
Speak with a scientist
Check out datasets
Explore more analysis tools