Genome in a Bottle Archives - Page 2 of 4

June 1, 2021

Copy-number variant detection with PacBio long reads

Long-read sequencing of diverse humans has revealed more than 20,000 insertion, deletion, and inversion structural variants spanning more than 12 Mb in a healthy human genome. Most of these variants are too large to detect with short reads and too small for array comparative genome hybridization (aCGH). While the standard approaches to calling structural variants with long reads thrive in the 50 bp to 10 kb size range, they tend to miss exactly the large (>50 kb) copy-number variants that are called more readily with aCGH. Standard algorithms rely on reference-based mapping of reads that fully span a variant or on de novo assembly; and copy-number variants are often too large to be spanned by a single read and frequently involve segmentally duplicated sequence that is not yet included in most de novo assemblies. To comprehensively detect large variants in human genomes, we extended pbsv – a structural variant caller for long reads – to call copy-number variants (CNVs) from read-clipping and read-depth signatures. In human germline benchmark samples, we detect more than 300 CNVs spanning around 10 Mb, and we call hundreds of additional events in re-arranged cancer samples. Together with insertion, deletion, inversion, duplication, and translocation calling from spanning reads, this allows pbsv to comprehensively detect large variants from a single data type.

June 1, 2021

Comprehensive variant detection in a human genome with highly accurate long reads

Introduction: Long-read sequencing has been applied successfully to assemble genomes and detect structural variants. However, due to high raw-read error rates (10-15%), it has remained difficult to call small variants from long reads. Recent improvements in library preparation and sequencing chemistry have increased length, accuracy, and throughput of PacBio circular consensus sequencing (CCS) reads, resulting in 15-20kb reads with average read quality above 99%. Materials and Methods: We sequenced a library from human reference sample HG002 to 18-fold coverage on the PacBio Sequel II with two SMRT Cells 8M. The CCS algorithm was used to generate highly accurate (average 99.9%) 12.9kb reads, which were mapped to the hg19 reference with pbmm2. We detected small variants using Google DeepVariant with a model trained for CCS and phased the variants using WhatsHap. Structural variants were detected with pbsv. Variant calls were evaluated against Genome in a Bottle (GIAB) benchmarks. Results: With these reads, DeepVariant achieves SNP and Indel F1 scores of 99.70% and 96.59% against the GIAB truth set, and pbsv achieves 97.72% recall on structural variants longer than 50bp. Using WhatsHap, small variants were phased into haplotype blocks with 145kb N50. The improved mappability of long reads allows us to align to and detect variants in medically relevant genes such as CYP2D6 and PMS2 that have proven “difficult-to-map” with short reads. Conclusions: These highly accurate long reads combine the mappability and ability to detect structural variants of long reads with the accuracy and ability to detect small variants of short reads.

June 1, 2021

Comprehensive variant detection in a human genome with highly accurate long reads

Introduction: Long-read sequencing has revealed more than 20,000 structural variants spanning over 12 Mb in a healthy human genome. Short-read sequencing fails to detect most structural variants but has remained the more effective approach for small variants, due to 10-15% error rates in long reads, and copy-number variants (CNVs), due to lack of effective long-read variant callers. The development of PacBio highly accurate long reads (HiFi reads) with read lengths of 10-25 kb and quality >99% presents the opportunity to capture all classes of variation with one approach.Methods: We sequence the Genome in a Bottle benchmark sample HG002 and an individual with a presumed Mendelian disease with HiFi reads. We call SNVs and indels with DeepVariant and extend the structural variant caller pbsv to call CNVs using read depth and clipping signatures. Results: For 18-fold coverage with 13 kb HiFi reads, variant calling in HG002 achieves an F1 score of 99.7% for SNVs, 96.6% for indels, and 96.4% for structural variants. Additionally, we detect more than 300 CNVs spanning around 10 Mb. For the Mendelian disease case, HiFi reads reveal thousands of variants that were overlooked by short-read sequencing, including a candidate causative structural variant. Conclusions: These results illustrate the ability of HiFi reads to comprehensively detect variants, including those associated with human disease.

February 5, 2021

Podcast: Marc Salit discusses creating the foundation of genomics

Marc Salit is the leader of the Genome Scale Measurement Group at the National Institute of Standards and Technology or NIST. In this Mendelspod podcast, he explains how NIST played…

February 5, 2021

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

Jonas Korlach spoke about recent SMRT Sequencing updates, such as latest Sequel System chemistry release (1.2.1) and updates to the Integrative Genomics Viewer that’s now update optimized for PacBio data….

February 5, 2021

Webinar: Variant calling and de novo genome assembly with PacBio HiFi reads

In this webinar, Sarah Kingan, Staff Scientist, PacBio, presents recent work on de novo genome assembly using PacBio HiFi reads. She highlights the benefits of HiFi data for base level…

February 5, 2021

AGBT Presentation: HiFi long reads for comprehensive genomic analysis

In this AGBT presentation, Mike Hunkapiller shares insights on using highly accurate long (HiFi) reads generated in circular consensus sequencing (CCS) mode for comprehensive genomic analysis and provides examples such…

February 5, 2021

ASHG PacBio Workshop: Sequence with confidence – A new era of highly accurate long-read sequencing

In this presentation, Emily Hatas of PacBio offers a look a how SMRT Sequencing has changed over the years as well as the most common applications in human genome analysis:…

February 5, 2021

Webinar: Sequencing 101 – How long-read sequencing improves access to genetic information

In this webinar, Kristin Mars, Sequencing Specialist, PacBio, presents an introduction to PacBio’s technology and its applications followed by a panel discussion among sequencing experts. The panel discussion addresses such…

February 5, 2021

ASHG CoLab: PacBio HiFi reads for comprehensive characterization of genomes and single-cell isoform expression

In this ASHG 2020 CoLab presentation hear Principal Scientists, Aaron Wenger and Elizabeth Tseng share how highly accurate long reads (HiFi reads) provide comprehensive variant detection for both genomes and…

February 5, 2021

Video Poster: Accurate, comprehensive variant calling in difficult-to-map genes using HiFi reads

Introduction: Around 5% (1,168) of protein-coding genes in the human genome contain an exon that is difficult to map with typical next-generation sequencing (NGS) read lengths due to homologous pseudogenes…

February 5, 2021

Webinar: Increasing solve rates for rare and Mendelian diseases with long-read sequencing

Dr. Wenger gives attendees an update on PacBio’s long-read sequencing and variant detection capabilities on the Sequel II System and shares recommendations on how to design your own study using…

April 21, 2020

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes. © 2019 John Wiley & Sons Ltd/University College London.

April 21, 2020

A robust benchmark for germline structural variant detection

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls =50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.

April 21, 2020

High satellite repeat turnover in great apes studied with short- and long-read technologies.

Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: (1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and (2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males vs. females; using Y chromosome assemblies or FIuorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59?kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Auto Tag: Genome in a Bottle

Copy-number variant detection with PacBio long reads

Comprehensive variant detection in a human genome with highly accurate long reads

Comprehensive variant detection in a human genome with highly accurate long reads

Podcast: Marc Salit discusses creating the foundation of genomics

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

Webinar: Variant calling and de novo genome assembly with PacBio HiFi reads

AGBT Presentation: HiFi long reads for comprehensive genomic analysis

ASHG PacBio Workshop: Sequence with confidence – A new era of highly accurate long-read sequencing

Webinar: Sequencing 101 – How long-read sequencing improves access to genetic information

ASHG CoLab: PacBio HiFi reads for comprehensive characterization of genomes and single-cell isoform expression

Video Poster: Accurate, comprehensive variant calling in difficult-to-map genes using HiFi reads

Webinar: Increasing solve rates for rare and Mendelian diseases with long-read sequencing

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

A robust benchmark for germline structural variant detection

High satellite repeat turnover in great apes studied with short- and long-read technologies.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert