Structural variation accounts for much of the variation among human genomes. Structural variants of all types are known to cause Mendelian disease and contribute to complex disease. Learn how long-read sequencing is enabling detection of the full spectrum of structural variants to advance the study of human disease, evolution and genetic diversity.
Mark Gerstein is the co-director of the Yale Computational Biology and Bioinformatics program where he focuses on better annotation of the human genome and better ways to mine big genomics data. He has played a big role in some of the large genomics initiatives since the first human genome project, including ENCODE and the 1,000 Genomes Project. “I’m very enthusiastic, of course, about the thousand dollar genome, but I don’t think that a true human genome has arrived for a thousand dollars,” Mark says at the outset of this Mendelspod interview. “The great excitement of next generation sequencing—which is deserved—has…
Euan Ashley from Stanford University started with the premise that while current efforts in the field of genomics medicine address 30% of patient cases, there’s a need for new approaches to make sense of the remaining 70%. Toward that end, he said that accurately calling structural variants is a major need. In one translational research example, Ashley said that SMRT Sequencing with the Sequel System allowed his team to identify six potentially causative genes in an individual with complex and varied symptoms; one gene was associated with Carney syndrome, which was a match for the person’s physiology and was later…
Melissa Laird Smith discussed how the Icahn School of Medicine at Mount Sinai uses long-read sequencing for translational research. She gave several examples of targeted sequencing projects run on the Sequel System including CYP2D6, phased mutations of GLA in Fabry’s disease, structural variation breakpoint validation in glioblastoma, and full-length immune profiling of TCR sequences.
SMRT Sequencing is a DNA sequencing technology characterized by long read lengths and high consensus accuracy, regardless of the sequence complexity or GC content of the DNA sample. These characteristics can be harnessed to address medically relevant genes, mRNA transcripts, and other genomic features that are otherwise difficult or impossible to resolve. I will describe examples for such new clinical research in diverse areas, including full-length gene sequencing with allelic haplotype phasing, gene/pseudogene discrimination, sequencing extreme DNA contexts, high-resolution pharmacogenomics, biomarker discovery, structural variant resolution, full-length mRNA isoform cataloging, and direct methylation detection.
Targeted sequencing experiments commonly rely on either PCR or hybrid capture to enrich for targets of interest. When using short read sequencing platforms, these amplicons or fragments are frequently targeted to a few hundred base pairs to accommodate the read lengths of the platform. Given PacBio’s long readlength, it is straightforward to sequence amplicons or captured fragments that are multiple kilobases in length. These long sequences are useful for easily visualizing variants that include SNPs, CNVs and other structural variants, often without assembly. We will review methods for the sequencing of long amplicons and provide examples using amplicons that range…
In this ASHG workshop presentation, Stuart Scott of the Icahn School of Medicine at Mount Sinai, presented on using the PacBio system for amplicon sequencing in pharmacogenomics and clinical genomics workflows. Accurate, phased amplicon sequence for the CYP2D6 gene, for example, has allowed his team to reclassify up to 20% of samples, providing data that’s critical for drug metabolism and dosing. In clinical genomics, Scott presented several case studies illustrating the utility of highly accurate, long-read sequencing for assessing copy number variants and for confirming a suspected medical diagnosis in rare disease patients. He noted that the latest Sequel System…
In this PacBio User Group Meeting presentation, Mount Sinai’s Ethan Ellis presents results from the HLS-CATCH method, which involves the use of the SageHLS instrument with CRISPR design methods to target and extract large genomic fragments for sequencing while avoiding pseudogenes and other confounding regions.
Introduction: Around 5% (1,168) of protein-coding genes in the human genome contain an exon that is difficult to map with typical next-generation sequencing (NGS) read lengths due to homologous pseudogenes or segmental duplications. Among the difficult-to-map genes are 193 with known medical relevance, including CYP2D6, GBA, SMN1/2, and VWF. Long-read DNA sequencing provides increased mappability, accessing many of the difficult-to-map regions by connecting the homologous exon to neighboring unique sequence. Until recently, the read-level accuracy of long-read sequencing had made it challenging to accurately call small variants. The recently developed HiFi reads from the PacBio Sequel II System provide both…
In this ASHG 2020 PacBio Workshop Emily Farrow of Children’s Mercy Kansas City, shares how the incorporation of long-read sequencing into the Genomic Answers for Kids research study is increasing diagnostic yields through the identification of novel genetic variation. Emily highlights several cases in which PacBio HiFi sequencing was able to provide insights where short-read sequencing alone was inconclusive, due to limitations stemming from repetitive regions and large structural variants.
NGS is commonly used for amplicon sequencing in clinical applications to study genetic disorders and detect disease-causing mutations. This approach can be plagued by limited ability to phase sequence variants and makes interpretation of sequence data difficult when pseudogenes are present. Long-read highly accurate amplicon sequencing can provide very accurate, efficient, high throughput (through multiplexing) sequences from single molecules, with read lengths largely limited by PCR. Data is easy to interpret; phased variants and breakpoints are present within high fidelity individual reads. Here we show SMRT Sequencing of the PMS2 and OPN1 (MW and LW) genes using the Sequel System.…
While the identification of individual SNPs has been readily available for some time, the ability to accurately phase SNPs and structural variation across a haplotype has been a challenge. With individual reads of an average length of 9 kb (P5-C3), and individual reads beyond 30 kb in length, SMRT Sequencing technology allows the identification of mutation combinations such as microdeletions, insertions, and substitutions without any predetermined reference sequence. Long- amplicon analysis is a novel protocol that identifies and reports the abundance of differing clusters of sequencing reads within a single library. Graphs generated via hierarchical clustering of individual sequencing reads…
The complex immune regions of the genome, including MHC and KIR, contain large copy number variants (CNVs), a high density of genes, hyper-polymorphic gene alleles, and conserved extended haplotypes (CEH) with enormous linkage disequilibrium (LDs). This level of complexity and inherent biases of short-read sequencing make it challenging for extracting immune region haplotype information from reference-reliant, shotgun sequencing and GWAS methods. As NGS based genome and exome sequencing and SNP arrays have become a routine for population studies, numerous efforts are being made for developing software to extract and or impute the immune gene information from these datasets. Despite these…
We have produced an updated annotation of the Norway spruce genome on the basis of an in siliconormalised set of RNA-Seq data obtained from 1,529 samples and comprising 15.5 billion paired-end Illumina HiSeq reads complemented by 18Mbp of PacBio cDNA data (3.2M sequences). In addition to augmenting and refining the previous protein coding gene annotation, here we focus on the addition of long intergenic non-coding RNA (lincRNA) and micro RNA (miRNA) genes. In addition to non-coding loci, our analyses also identified protein coding genes that had been missed by the initial genome annotation and enabled us to update the annotation…
Target enrichment capture methods allow scientists to rapidly interrogate important genomic regions of interest for variant discovery, including SNPs, gene isoforms, and structural variation. Custom targeted sequencing panels are important for characterizing heterogeneous, complex diseases and uncovering the genetic basis of inherited traits with more uniform coverage when compared to PCR-based strategies. With the increasing availability of high-quality reference genomes, customized gene panels are readily designed with high specificity to capture genomic regions of interest, thus enabling scientists to expand their research scope from a single individual to larger cohort studies or population-wide investigations. Coupled with PacBio® long-read sequencing, these…