May 24, 2023  |  Human genetics research

Beyond short reads: Harnessing long-read sequencing for population-scale genomic insights

population genomics and precision health blog header image

Large-scale population sequencing programs have a number of important research goals and outcomes. These include:

Generating a comprehensive catalogue of genetic variation

to reflect the genetic diversity in a target population.

Maximizing discovery power through accurate variant calling across all variant classes and throughout the entire human genome

including challenging genomic regions (highly repetitive or homologous regions), while also leveraging additional genetic context like phasing and methylation status.

Creating broad and long-term utility of generated data

to serve as a resource for population genetics, translational research and pharma R&D.

Returning value to a population

through returning direct results to study participants in order to help enable precision health and preventative programs.

In recent years, population genetics and precision health research have relied on large genomic datasets, often combining micro-array, exome and genome data for the results they need. Gaps and challenges within the generated data have remained persistent, mainly due to technological limitations of mainstream next-generation sequencing technologies requiring additional methods to complement the data and fill the gaps.

Current short-read next generation sequencing (NGS) methods do not detect variants in dark regions and routinely struggle with large or complex variants, which can disparately affect obtaining genetic insights from ancestral populations and can result in a partial or incomplete understanding of the genetic causes and mechanisms of disease.

Here are some examples of disease-related genes that illustrate this issue:

  • SMN1: Silent SMA (spinal muscular atrophy) carriers of African descent remain poorly detected with current short-read NGS and related callers
  • LPA: High levels of LP(a) (lipoprotein (a)) increase the risk for heart attack, stroke and aortic stenosis, long variable repeats in the LPA gene remain difficult for short-read methods, the highest levels of Lp(a) are found in Africans and Southern Asians
  • HBA, HBB, HBM: Genes involved in thalassemia and sickle cell diseases, where homologous sequences, copy number variants (CNVs) and gene fusions affect the analysis, hemoglobinopathies are most common in Mediterranean, Middle Eastern, Southeast Asian, African, and African Americans


The long-read sequencing difference

HiFi sequencing revolutionizes the field, changing the game significantly for population genomics programs. In comparison to traditional short-read methods, a HiFi genome can detect 2.5 times more structural variants. While single nucleotide variants (SNVs) and indels make up the majority of variants in number, it is the structural variants (SVs) that have a greater impact on the number of base pairs affected throughout the genome. In fact, SVs alter more bases than SNVs and indels combined. Moreover, HiFi sequencing proves invaluable in analyzing the previously inaccessible “dark regions” of the genome that remain beyond the reach of short-read whole genome sequencing (WGS). These regions contain numerous medically relevant genes such as SMN1, HBM, and LPA. Additionally, HiFi genomes offer a genome-wide methylation signal alongside base-calling and the ability to phase variants into distinct haplotypes, enabling the generation of more disease-relevant insights.

To ensure the broad and long-term utility of population sequencing data, it is crucial for programs to obtain comprehensive and genome-wide insights across all ancestries while working within their budgets. PacBio’s HiFi data has been recognized as indispensable in this pursuit, as noted by the authors of a recent paper involved in one of the world’s largest and most advanced population sequencing projects, “We should continue developing population-scale cohorts sequenced with long reads only. The question rises if we have entered the age of using long reads exclusively.”

Interested in learning how HiFi sequencing can make a difference in your next population or cohort sequencing project? Visit our Population Genetics + Carrier Screening page or contact us to explore your options.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.