Learn why it is critically important to understand accuracy in DNA sequencing to distinguish important biological information from sequencing errors.
Introduction: Around 5% (1,168) of protein-coding genes in the human genome contain an exon that is difficult to map with typical next-generation sequencing (NGS) read lengths due to homologous pseudogenes or segmental duplications. Among the difficult-to-map genes are 193 with known medical relevance, including CYP2D6, GBA, SMN1/2, and VWF. Long-read DNA sequencing provides increased mappability, accessing many of the difficult-to-map regions by connecting the homologous exon to neighboring unique sequence. Until recently, the read-level accuracy of long-read sequencing had made it challenging to accurately call small variants. The recently developed HiFi reads from the PacBio Sequel II System provide both…
In this PacBio Virtual Global Summit 2020 presentation, Shawn Levy of Discovery Life Sciences shares work on how the ability to reliably detect and characterize the spectrum of sequence variants observed in the human genome is critical for understanding the role of mutation and variation in genetic risk and phenotype. Levy describes how scaled CCS analysis on the Sequel II System supports efficient and reliable detection of complex and simple variants in diverse populations with high accuracy.
In this PacBio Virtual Global Summit 2020 presentation, Pi-Chuan Chang of Google shares how DeepVariant identifies SNPs and Indels in PacBio HiFi data, starting from the v0.8 release (April 2019). In her talk, Chang details recent accuracy improvements that won the PrecisionFDA Truth Challenge v2, which are now available in the latest release (v1.0 in September 2020).