In this AGBT presentation, Mike Hunkapiller shares insights on using highly accurate long (HiFi) reads generated in circular consensus sequencing (CCS) mode for comprehensive genomic analysis and provides examples such as the sequencing of a Genome in a Bottle reference sample, which concluded with Q48 accuracy, 18 Mb contigs, and clearly phased haplotypes.
In this presentation, Emily Hatas of PacBio offers a look a how SMRT Sequencing has changed over the years as well as the most common applications in human genome analysis: high-throughput structural variant detection; comprehensive variant detection; and de novo assembly of reference genomes.
In this webinar, Kristin Mars, Sequencing Specialist, PacBio, presents an introduction to PacBio’s technology and its applications followed by a panel discussion among sequencing experts. The panel discussion addresses such things as what long reads are and how are they useful, what differentiates PacBio long-read sequencing from other technologies, and the applications PacBio offers and how they can benefit scientific research.
In this short video, Aaron Wenger, a Principal Scientist at PacBio, explains what highly accurate long reads, or HiFi reads, are and how they help to detect all variant types including single nucleotide, indels, and structural variants. He goes on to recap the precisionFDA Truth Challenge V2 which used Genome in a Bottle (GIAB) benchmarks to evaluate various sequencing technologies. In the 2020 challenge, when ranked for accuracy, PacBio HiFi reads delivered the highest precision and recall in all categories.
Introduction: Around 5% (1,168) of protein-coding genes in the human genome contain an exon that is difficult to map with typical next-generation sequencing (NGS) read lengths due to homologous pseudogenes or segmental duplications. Among the difficult-to-map genes are 193 with known medical relevance, including CYP2D6, GBA, SMN1/2, and VWF. Long-read DNA sequencing provides increased mappability, accessing many of the difficult-to-map regions by connecting the homologous exon to neighboring unique sequence. Until recently, the read-level accuracy of long-read sequencing had made it challenging to accurately call small variants. The recently developed HiFi reads from the PacBio Sequel II System provide both…
In this ASHG 2020 CoLab presentation hear Principal Scientists, Aaron Wenger and Elizabeth Tseng share how highly accurate long reads (HiFi reads) provide comprehensive variant detection for both genomes and transcriptomes. Aaron Wenger describes how new improvements in protocols and analysis methods have increased scalability and accuracy of variant calling. As demonstrated in the precisionFDA Truth Challenge V2, HiFi reads (>99% accurate, 15 kb – 20 kb) now outperform short reads for single nucleotide and structural variant calling and match for small indels. This includes calling >30,000 small variants and >10,000 structural variants missed by short reads, many in medically…
Dr. Wenger gives attendees an update on PacBio’s long-read sequencing and variant detection capabilities on the Sequel II System and shares recommendations on how to design your own study using HiFi reads. Then, Dr. Sund from Cincinnati Children’s Hospital Medical Center describes how she has used long-read sequencing to solve rare neurological diseases involving complex structural rearrangements that were previously unsolved with standard methods.
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in…
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based,…
Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently…
Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington’s Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and…
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a ‘first of its kind’ resource that is available to the community for multiple downstream applications. We produce 17% more benchmark…
The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5?h using a system with 36?CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated…
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of…
Whole genome sequencing (WGS) has increased in popularity and decreased in cost over the past decade, rendering this approach as a viable and sensitive method for variant detection. In addition to its utility for single nucleotide variant detection, WGS data has the potential to detect Copy Number Variants (CNV) to fine resolution. Many CNV detection software packages have been developed exploiting four main types of data: read pair, split read, read depth, and assembly based methods. The aim of this study was to evaluate the efficiency of each of these main approaches in detecting germline deletions.WGS data and high confidence…