Inversion Archives - Page 2 of 16

June 1, 2021

Comprehensive variant detection in a human genome with PacBio high-fidelity reads

Human genomic variations range in size from single nucleotide substitutions to large chromosomal rearrangements. Sequencing technologies tend to be optimized for detecting particular variant types and sizes. Short reads excel at detecting SNVs and small indels, while long or linked reads are typically used to detect larger structural variants or phase distant loci. Long reads are more easily mapped to repetitive regions, but tend to have lower per-base accuracy, making it difficult to call short variants. The PacBio Sequel System produces two main data types: long continuous reads (up to 100 kbp), generated by single passes over a long template, and Circular Consensus Sequence (CCS) reads, generated by calculating the consensus of many sequencing passes over a single shorter template (500 bp to 20 kbp). The long-range information in continuous reads is useful for genome assembly and structural variant detection. The higher base accuracy of CCS effectively detects and phases short variants in single molecules. Recent improvements in library preparation protocols and sequencing chemistry have increased the length, accuracy, and throughput of CCS reads. For the human sample HG002, we collected 28-fold coverage 15 kbp high-fidelity CCS reads with an average read quality above Q20 (99% accuracy). The length and accuracy of these reads allow us to detect SNVs, indels, and structural variants not only in the Genome in a Bottle (GIAB) high confidence regions, but also in segmental duplications, HLA loci, and clinically relevant “difficult-to-map” genes. As with continuous long reads, we call structural variants at 90.0% recall compared to the GIAB structural variant benchmark “truth” set, with the added advantages of base pair resolution for variant calls and improved recall at compound heterozygous loci. With minimap2 alignments, GATK4 HaplotypeCaller variant calls, and simple variant filtration, we have achieved a SNP F-Score of 99.51% and an INDEL F-Score of 80.10% against the GIAB short variant benchmark “truth” set, in addition to calling variants outside of the high confidence region established by GIAB using previous technologies. With the long-range information available in 15 kbp reads, we applied the read-backed phasing tool WhatsHap to generate phase blocks with a mean length of 65 kbp across the entire genome. Using an alignment-based approach, we typed all major MHC class I and class II genes to at least 3-field precision. This new data type has the potential to expand the GIAB high confidence regions and “truth” benchmark sets to many previously difficult-to-map genes and allow a single sequencing protocol to address both short variants and large structural variants.

June 1, 2021

High-quality human genomes achieved through HiFi sequence data and FALCON-Unzip assembly

De novo assemblies of human genomes from accurate (85-90%), continuous long reads (CLR) now approach the human reference genome in contiguity, but the assembly base pair accuracy is typically below QV40 (99.99%), an order-of-magnitude lower than the standard for finished references. The base pair errors complicate downstream interpretation, particularly false positive indels that lead to false gene loss through frameshifts. PacBio HiFi sequence data, which are both long (>10 kb) and very accurate (>99.9%) at the individual sequence read level, enable a new paradigm in human genome assembly. Haploid human assemblies using HiFi data achieve similar contiguity to those using CLR data and are highly accurate at the base level1. Furthermore, HiFi assemblies resolve more high-identity sequences such as segmental duplications2. To enable HiFi assembly in diploid human samples, we have extended the FALCON-Unzip assembler to work directly with HiFi reads. Here we present phased human diploid genome assemblies from HiFi sequencing of HG002, HG005, and the Vertebrate Genome Project (VGP) mHomSap1 trio on the PacBio Sequel II System. The HiFi assemblies all exceed the VGP’s quality guidelines, approaching QV50 (99.999%) accuracy. For HG002, 60% of the genome was haplotype-resolved, with phase-block N50 of 143Kbp and phasing accuracy of 99.6%. The overall mean base accuracy of the assembly was QV49.7. In conclusion, HiFi data show great promise towards complete, contiguous, and accurate diploid human assemblies.

June 1, 2021

Comprehensive structural and copy-number variant detection with long reads

To comprehensively detect large variants in human genomes, we have extended pbsv – a structural variant caller for long reads – to call copy-number variants (CNVs) from read-clipping and read-depth signatures. In human germline benchmark samples, we detect more than 300 CNVs spanning around 10 Mb, and we call hundreds of additional events in re-arranged cancer samples. Long-read sequencing of diverse humans has revealed more than 20,000 insertion, deletion, and inversion structural variants spanning more than 12 Mb in a typical human genome. Most of these variants are too large to detect with short reads and too small for array comparative genome hybridization (aCGH). While the standard approaches to calling structural variants with long reads thrive in the 50 bp to 10 kb size range, they tend to miss exactly the large (>50 kb) copy-number variants that are called more readily with aCGH and short reads. Standard algorithms rely on reference-based mapping of reads that fully span a variant or on de novo assembly; and copy-number variants are often too large to be spanned by a single read and frequently involve segmentally duplicated sequence that is not yet included in most de novo assemblies.

June 1, 2021

Copy-number variant detection with PacBio long reads

Long-read sequencing of diverse humans has revealed more than 20,000 insertion, deletion, and inversion structural variants spanning more than 12 Mb in a healthy human genome. Most of these variants are too large to detect with short reads and too small for array comparative genome hybridization (aCGH). While the standard approaches to calling structural variants with long reads thrive in the 50 bp to 10 kb size range, they tend to miss exactly the large (>50 kb) copy-number variants that are called more readily with aCGH. Standard algorithms rely on reference-based mapping of reads that fully span a variant or on de novo assembly; and copy-number variants are often too large to be spanned by a single read and frequently involve segmentally duplicated sequence that is not yet included in most de novo assemblies. To comprehensively detect large variants in human genomes, we extended pbsv – a structural variant caller for long reads – to call copy-number variants (CNVs) from read-clipping and read-depth signatures. In human germline benchmark samples, we detect more than 300 CNVs spanning around 10 Mb, and we call hundreds of additional events in re-arranged cancer samples. Together with insertion, deletion, inversion, duplication, and translocation calling from spanning reads, this allows pbsv to comprehensively detect large variants from a single data type.

February 5, 2021

ASHG PacBio Workshop: PacBio applications updates & future roadmap

In this ASHG 2017 presentation, Jonas Korlach, the CSO of PacBio shared updates on three applications featuring SMRT Sequencing on the Sequel System, highlighting structural variant detection, targeted sequencing and…

February 5, 2021

ASHG PacBio Workshop: Long-read sequencing for detecting clinically relevant structural variation

In this ASHG 2017 presentation, Han Brunner of Radboud University Medical Center presented research using SMRT Sequencing to detect structural variants to uncover the genetic causes of intellectual disability. He…

February 5, 2021

Webinar: SMRT Sequencing applications in plant and animal sciences: an overview

In this webinar, Emily Hatas of PacBio shares information about the applications and benefits of SMRT Sequencing in plant and animal biology, agriculture, and industrial research fields. This session contains…

February 5, 2021

Video: Structural variant detection with SMRT Sequencing

In this video, Aaron Wenger, a research scientist at PacBio, describes the use of long-read SMRT Sequencing to detect structural variants in the human genome. He shares that structural variations…

February 5, 2021

Webinar: Sequencing structural variants for disease gene discovery and population genetics

Structural variants (SVs, differences >50 base pairs) account for most of the base pairs that differ between two human genomes, and are known to cause over 1,000 genetic disorders including…

February 5, 2021

Webinar: Size Matters: Accurate detection and phasing of structural variations

In this presentation Fritz Sedlazeck describes his latest work to obtain comprehensive genomes leveraging long-read sequencing and linked reads.

February 5, 2021

PAG Conference: Dawn of the crop pangenome era

To make improvements to crops like corn, soybeans, and canola, scientists at Corteva are building a compendium of crop genomics resources to provide actionable sequence info for genetic discovery, gene-editing,…

February 5, 2021

Webinar: Sequence with Confidence – Introducing the Sequel II System

In this webinar, Jonas Korlach, Chief Scientific Officer, PacBio provides an overview of the features and the advantages of the new Sequel II System. Kiran Garimella, Senior Computational Scientist, Broad…

February 5, 2021

PAG Conference: The impact of highly accurate PacBio sequence data on the assembly of a tetraploid rose

In this presentation at PAG 2020, Bart Nijland of Genetwister Technologies explains how his team set out to make a haplotype-aware assembly of the highly complex tetraploid Rosa x hybrida…

February 5, 2021

Webinar: Bioinformatics lunch & learn – Better assemblies of bacterial genomes and plasmids with the new microbial assembly pipeline in SMRT Link v8.0

Microbial Assembly is our latest pipeline, specifically designed to assemble bacterial genomes (between 2 and 10 Mb) and plasmids. This pipeline includes the implementation of a new, circular-aware read alignment…

February 5, 2021

PacBio Workshop: Understanding the biology of genomes with HiFi sequencing

The utility of new highly accurate long reads, or HiFi reads, was first demonstrated for calling all variant types in human genomes. It has since been shown that HiFi reads…

Auto Tag: Inversion

Comprehensive variant detection in a human genome with PacBio high-fidelity reads

High-quality human genomes achieved through HiFi sequence data and FALCON-Unzip assembly

Comprehensive structural and copy-number variant detection with long reads

Copy-number variant detection with PacBio long reads

ASHG PacBio Workshop: PacBio applications updates & future roadmap

ASHG PacBio Workshop: Long-read sequencing for detecting clinically relevant structural variation

Webinar: SMRT Sequencing applications in plant and animal sciences: an overview

Video: Structural variant detection with SMRT Sequencing

Webinar: Sequencing structural variants for disease gene discovery and population genetics

Webinar: Size Matters: Accurate detection and phasing of structural variations

PAG Conference: Dawn of the crop pangenome era

Webinar: Sequence with Confidence – Introducing the Sequel II System

PAG Conference: The impact of highly accurate PacBio sequence data on the assembly of a tetraploid rose

Webinar: Bioinformatics lunch & learn – Better assemblies of bacterial genomes and plasmids with the new microbial assembly pipeline in SMRT Link v8.0

PacBio Workshop: Understanding the biology of genomes with HiFi sequencing

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert