Menu
April 21, 2020

A robust benchmark for germline structural variant detection

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls =50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.


April 21, 2020

Large Fragment Deletions Induced by Cas9 Cleavage While Not in BEs System in Rabbit

CRISPR-Cas9 and BEs system are poised to become the gene editing tool of choice in clinical contexts, however large fragment deletion was found in Cas9-mediated mutation cells without animal level validation. By analyzing 16 gene-edited rabbit lines (including 112 rabbits) generated using SpCas9, BEs, xCas9 and xCas9-BEs with long-range PCR genotyping and long-read sequencing by PacBio platform, we show that extending thousands of bases fragment deletions in single-guide RNA/Cas9 and xCas9 system mutation rabbit, but few large deletions were found in BEs-induced mutation rabbits. We firstly validated that no large fragment deletion induced by BEs system at animal level, suggesting that BE systems can be beneficial tools for the further development of highly accurate and secure gene therapy for the clinical treatment of human genetic disorders


April 21, 2020

Comparison of mitochondrial DNA variants detection using short- and long-read sequencing.

The recent advent of long-read sequencing technologies is expected to provide reasonable answers to genetic challenges unresolvable by short-read sequencing, primarily the inability to accurately study structural variations, copy number variations, and homologous repeats in complex parts of the genome. However, long-read sequencing comes along with higher rates of random short deletions and insertions, and single nucleotide errors. The relatively higher sequencing accuracy of short-read sequencing has kept it as the first choice of screening for single nucleotide variants and short deletions and insertions. Albeit, short-read sequencing still suffers from systematic errors that tend to occur at specific positions where a high depth of reads is not always capable to correct for these errors. In this study, we compared the genotyping of mitochondrial DNA variants in three samples using PacBio’s Sequel (Pacific Biosciences Inc., Menlo Park, CA, USA) long-read sequencing and illumina’s HiSeqX10 (illumine Inc., San Diego, CA, USA) short-read sequencing data. We concluded that, despite the differences in the type and frequency of errors in the long-reads sequencing, its accuracy is still comparable to that of short-reads for genotyping short nuclear variants; due to the randomness of errors in long reads, a lower coverage, around 37 reads, can be sufficient to correct for these random errors.


April 21, 2020

Paragraph: A graph-based structural variant genotyper for short-read sequence data

Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, a fast and accurate genotyper that models SVs using sequence graphs and SV annotations produced by a range of methods and technologies. We demonstrate the accuracy of Paragraph on whole genome sequence data from a control sample with both short and long read sequencing data available, and then apply it at scale to a cohort of 100 samples of diverse ancestry sequenced with short-reads. Comparative analyses indicate that Paragraph has better accuracy than other existing genotypers. The Paragraph software is open-source and available at ?https://github.com/Illumina/paragraph


April 21, 2020

Characterization of LINE-1 transposons in a human genome at allelic resolution

The activity of the retrotransposon LINE-1 has created a substantial portion of the human genome. Most of this sequence comprises fractured and debilitated LINE-1s. An accurate approximation of the number, location, and sequence of the LINE-1 elements present in any single genome has proven elusive due to the difficulty of assembling and phasing the repetitive and polymorphic regions of the human genome. Through an in-depth analysis of publicly-available, deep, long-read assemblies of nearly homozygous human genomes, we defined the location and sequence of all intact LINE-1s in these assemblies. We found 148 and 142 intact LINE-1s in two nearly homozygous assemblies. A combination of these assemblies suggests a diploid human genome contains at least 50% more intact LINE-1s than previous estimates textendash in this case, 290 intact LINE-1s at 194 loci. We think this is the best approximation, to date, of the number of intact LINE-1s in a single diploid human genome. In addition to counting intact LINE-1 elements, we resolved the sequence of each element, including some LINE-1 elements in unassembled, presumably centromeric regions of the genome. A comparison of the intact LINE-1s in each assembly shows the specific pattern of variation between these genomes, including LINE-1s that remain intact in only one genome, allelic variation in shared intact LINE-1s, and LINE-1s that are unique (presumably young) insertions in only one genome. We found that many old elements (> 6 million years old) remain intact, and comparison of the young and intact LINE-1s across assemblies reinforces the notion that only a small portion of all LINE-1 sequences that may be intact in the genomes of the human population has been uncovered. This dataset provides the first nearly comprehensive estimate of LINE-1 diversity within an individual, an important dataset in the quest to understand the functional consequences of sequence variation in LINE-1 and the complete set of LINE-1s in the human population.


April 21, 2020

Loss-of-function tolerance of enhancers in the human genome

Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that each individual human genome possesses at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers are more tissue-specific and regulate fewer and more dispensable genes. They are enriched in immune-related cells while LoF-intolerant enhancers are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoF- tolerance of enhancers, which achieved an AUROC of 96%. We predict 5,677 more enhancers would be likely tolerant to LoF and 75 enhancers that would be highly LoF-intolerant. Our predictions are supported by known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.


April 21, 2020

Next-Generation Sequencing and Emerging Technologies.

Genetic sequencing technologies are evolving at a rapid pace with major implications for research and clinical practice. In this review, the authors provide an updated overview of next-generation sequencing (NGS) and emerging methodologies. NGS has tremendously improved sequencing output while being more time and cost-efficient in comparison to Sanger sequencing. The authors describe short-read sequencing approaches, such as sequencing by synthesis, ion semiconductor sequencing, and nanoball sequencing. Third-generation long-read sequencing now promises to overcome many of the limitations of short-read sequencing, such as the ability to reliably resolve repeat sequences and large genomic rearrangements. By combining complementary methods with massively parallel DNA sequencing, a greater insight into the biological context of disease mechanisms is now possible. Emerging methodologies, such as advances in nanopore technology, in situ nucleic acid sequencing, and microscopy-based sequencing, will continue the rapid evolution of this area. These new technologies hold many potential applications for hematological disorders, with the promise of precision and personalized medical care in the future.Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.


April 21, 2020

HLA*LA – HLA typing from linearly projected graph alignments.

HLA*LA implements a new graph alignment model for HLA type inference, based on the projection of linear alignments onto a variation graph. It enables accurate HLA type inference from whole-genome (99% accuracy) and whole-exome (93% accuracy) Illumina data; from long-read Oxford Nanopore and Pacific Biosciences data (98% accuracy for whole-genome and targeted data); and from genome assemblies. Computational requirements for a typical sample vary between 0.7 and 14 CPU hours per sample.HLA*LA is implemented in C?++ and Perl and freely available as a bioconda package or from https://github.com/DiltheyLab/HLA-LA (GPL v3).Supplementary data are available online. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

SyRI: identification of syntenic and rearranged regions from whole-genome assemblies

We present SyRI, an efficient tool for genome-wide identification of structural rearrangements (SR) from genome graphs, which are built up from pair-wise whole-genome alignments. Instead of searching for differences, SyRI starts by finding all co-linear regions between the genomes. As all remaining regions are SRs by definition, they can be classified as inversions, translocations, or duplications based on their positions in convoluted networks of repetitive alignments. Finally, SyRI reports local variations like SNPs and indels within syntenic and rearranged regions. We show SyRItextquoterights broad applicability to multiple species and genetically validate the presence of ~100 translocations identified in Arabidopsis.


April 21, 2020

Fam83F induces p53 stabilisation and promotes its activity.

p53 is one of the most important tumour suppressor proteins currently known. It is activated in response to DNA damage and this activation leads to proliferation arrest and cell death. The abundance and activity of p53 are tightly controlled and reductions in p53’s activity can contribute to the development of cancer. Here, we show that Fam83F increases p53 protein levels by protein stabilisation. Fam83F interacts with p53 and decreases its ubiquitination and degradation. Fam83F is induced in response to DNA damage and its overexpression also increases p53 activity in cell culture experiments and in zebrafish embryos. Downregulation of Fam83F decreases transcription of p53 target genes in response to DNA damage and increases cell proliferation, identifying Fam83F as an important regulator of the DNA damage response. Overexpression of Fam83F also enhances migration of cells harbouring mutant p53 demonstrating that it can also activate mutant forms of p53.


April 21, 2020

Mitochondrial DNA Variants in Patients with Liver Injury Due to Anti-Tuberculosis Drugs.

Hepatotoxicity is the most severe adverse effect of anti-tuberculosis therapy. Isoniazid’s metabolite hydrazine is a mitochondrial complex II inhibitor. We hypothesized that mitochondrial DNA variants are risk factors for drug-induced liver injury (DILI) due to isoniazid, rifampicin or pyrazinamide.We obtained peripheral blood from tuberculosis (TB) patients before anti-TB therapy. A total of 38 patients developed DILI due to anti-TB drugs. We selected 38 patients with TB but without DILI as controls. Next-generation sequencing detected point mutations in the mitochondrial DNA genome. DILI was defined as ALT =5 times the upper limit of normal (ULN), or ALT =3 times the ULN with total bilirubin =2 times the ULN.In 38 patients with DILI, the causative drug was isoniazid in eight, rifampicin in 14 and pyrazinamide in 16. Patients with isoniazid-induced liver injury had more variants in complex I’s NADH subunit 5 and 1 genes, more nonsynonymous mutations in NADH subunit 5, and a higher ratio of nonsynonymous to total substitutions. Patients with rifampicin- or pyrazinamide-induced liver injury had no association with mitochondrial DNA variants.Variants in complex I’s subunit 1 and 5 genes might affect respiratory chain function and predispose isoniazid-induced liver injury when exposed to hydrazine, a metabolite of isoniazid and a complex II inhibitor.


April 21, 2020

Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population.

The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.


April 21, 2020

Uncovering Missing Heritability in Rare Diseases.

The problem of ‘missing heritability’ affects both common and rare diseases hindering: discovery, diagnosis, and patient care. The ‘missing heritability’ concept has been mainly associated with common and complex diseases where promising modern technological advances, like genome-wide association studies (GWAS), were unable to uncover the complete genetic mechanism of the disease/trait. Although rare diseases (RDs) have low prevalence individually, collectively they are common. Furthermore, multi-level genetic and phenotypic complexity when combined with the individual rarity of these conditions poses an important challenge in the quest to identify causative genetic changes in RD patients. In recent years, high throughput sequencing has accelerated discovery and diagnosis in RDs. However, despite the several-fold increase (from ~10% using traditional to ~40% using genome-wide genetic testing) in finding genetic causes of these diseases in RD patients, as is the case in common diseases-the majority of RDs are also facing the ‘missing heritability’ problem. This review outlines the key role of high throughput sequencing in uncovering genetics behind RDs, with a particular focus on genome sequencing. We review current advances and challenges of sequencing technologies, bioinformatics approaches, and resources.


April 21, 2020

Critical length in long-read resequencing

Long-read sequencing has substantial advantages for structural variant discovery and phasing of vari- ants compared to short-read technologies, but the required and optimal read length has not been as- sessed. In this work, we used long reads simulated from human genomes and evaluated structural vari- ant discovery and variant phasing using current best practicebioinformaticsmethods.Wedeterminedthatoptimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequenc- ing projects.


April 21, 2020

A personalized platform identifies trametinib plus zoledronate for a patient with KRAS-mutant metastatic colorectal cancer.

Colorectal cancer remains a leading source of cancer mortality worldwide. Initial response is often followed by emergent resistance that is poorly responsive to targeted therapies, reflecting currently undruggable cancer drivers such as KRAS and overall genomic complexity. Here, we report a novel approach to developing a personalized therapy for a patient with treatment-resistant metastatic KRAS-mutant colorectal cancer. An extensive genomic analysis of the tumor’s genomic landscape identified nine key drivers. A transgenic model that altered orthologs of these nine genes in the Drosophila hindgut was developed; a robotics-based screen using this platform identified trametinib plus zoledronate as a candidate treatment combination. Treating the patient led to a significant response: Target and nontarget lesions displayed a strong partial response and remained stable for 11 months. By addressing a disease’s genomic complexity, this personalized approach may provide an alternative treatment option for recalcitrant disease such as KRAS-mutant colorectal cancer.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.