PacBio HiFi Reads: ‘Most Effective Stand-Alone Technology for de Novo Assembly’
Monday, August 12, 2019
UPDATE: The article is now published in the Annals of Human Genetics.
A new preprint evaluates the utility of PacBio HiFi reads for assembly of a human genome. The study is a follow-up to a recent publication in Nature Biotechnology that introduced a technique to generate sequencing reads with both long read length and high accuracy.
“Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads” comes from lead authors Mitchell Vollger and Glennis Logsdon, senior author Evan Eichler, and collaborators at the University of Washington, PacBio, and other research institutes. For this project, they focused on sequencing a hydatidiform mole human cell line (CHM13), a useful model system because it is haploid unlike typical diploid human cells. “We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets,” the scientists write.
The team generated 24-fold coverage of CHM13, the same sample used to produce a previous assembly with CLR data. They employed the Sequel II System, producing an average of 19.1 Gb of HiFi reads with each SMRT Cell 8M. The HiFi and CLR assemblies had similar contiguity: contig N50 of 29.5 Mb for HiFi and 29.3 Mb for CLR. The HiFi assembly was much more accurate, with an estimated Phred quality value of Q45, compared to Q40 for the CLR assembly. Further, the authors note that, due to divergence in BAC clones used to measure accuracy, the quality value for the HiFi assembly is “a lower bound of the true QV.”
Next, the scientists performed an analysis of segmental duplications (SDs), which are notoriously challenging elements to assemble correctly. The HiFi assembly resolved more of these duplications than the CLR assembly. “HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of large tandem repeats, as validated with orthogonal analyses… This is the highest fraction of resolved SDs for any of the published assemblies analyzed thus far,” they report.
“We conclude that there are three essential strengths of the HiFi technology over CLR technology,” the authors conclude, citing reduced compute time to generate a de novo assembly, superior assembly accuracy, and improved ability to assemble the most difficult regions of the genome. “Our results suggest that HiFi may currently be the most effective stand-alone technology for de novo assembly of human genomes.”