A workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads
PacBio HiFi reads (minimum 99% accuracy, 15-25 kb read length) have emerged as a powerful data type for comprehensive variant detection in human genomes. The HiFi read length extends confident mapping and variant calling to repetitive regions of the genome that are not accessible with short reads. Read length also improves detection of structural variants (SVs), with recall exceeding that of short reads by over 30%. High read quality allows for accurate single nucleotide variant and small indel detection, with precision and recall matching that of short reads. While many tools have been developed to take advantage of these qualities of HiFi reads, there is no end-to-end workflow for the filtering and prioritization of variants uniquely detected with long reads for rare and undiagnosed disease research. We have developed a flexible, modular workflow and web portal for variant analysis from HiFi reads and applied it to a set of rare disease cases unsolved by short-read whole genome sequencing. We expect that broad application of long-read variant detection workflows will solve many more rare disease cases. We have made these tools available at https://github.com/williamrowell/pbRUGD-workflow, and we hope they serve a starting point for developing a robust analysis framework for long read variant detection for rare diseases.