Menu
July 19, 2019  |  

SplitThreader: Exploration and analysis of rearrangements in cancer genomes

Genomic rearrangements and associated copy number changes are important drivers in cancer as they can alter the expression of oncogenes and tumor suppressors, create gene fusions, and misregulate gene expression. Here we present SplitThreader (http://splitthreader.com), an open- source interactive web application for analysis and visualization of genomic rearrangements and copy number variation in cancer genomes. SplitThreader constructs a sequence graph of genomic rearrangements in the sample and uses a priority queue breadth-first search algorithm on the graph to search for novel interactions. This is applied to detect gene fusions and other novel sequences, as well as to evaluate distances in the rearranged genome between any genomic regions of interest, especially the repositioning of regulatory elements and their target genes. SplitThreader also analyzes each variant to categorize it by its relation to other variants and by its copy number concordance. This identifies balanced translocations, identifies simple and complex variants, and suggests likely false positives when copy number is not concordant across a candidate breakpoint. It also provides explanations when multiple variants affect the copy number state and obscure the contribution of a single variant, such as a deletion within a region that is overall amplified. Together, these categories triage the variants into groups and provide a starting point for further systematic analysis and manual curation. To demonstrate its utility, we apply SplitThreader to three cancer cell lines, MCF-7 and A549 with Illumina paired- end sequencing, and SK-BR-3, with long-read PacBio sequencing. Using SplitThreader, we examine the genomic rearrangements responsible for previously observed gene fusions in SK-BR-3 and MCF-7, and discover many of the fusions involved a complex series of multiple genomic rearrangements. We also find notable differences in the types of variants between the three cell lines, in particular a much higher proportion of reciprocal variants in SK-BR-3 and a distinct clustering of interchromosomal variants in SK-BR-3 and MCF-7 that is absent in A549.


July 19, 2019  |  

Exonization of an intronic LINE-1 element causing Becker muscular dystrophy as a novel mutational mechanism in dystrophin gene.

A broad mutational spectrum in the dystrophin (DMD) gene, from large deletions/duplications to point mutations, causes Duchenne/Becker muscular dystrophy (D/BMD). Comprehensive genotyping is particularly relevant considering the mutation-centered therapies for dystrophinopathies. We report the genetic characterization of a patient with disease onset at age 13 years, elevated creatine kinase levels and reduced dystrophin labeling, where multiplex-ligation probe amplification (MLPA) and genomic sequencing failed to detect pathogenic variants. Bioinformatic, transcriptomic (real time PCR, RT-PCR), and genomic approaches (Southern blot, long-range PCR, and single molecule real-time sequencing) were used to characterize the mutation. An aberrant transcript was identified, containing a 103-nucleotide insertion between exons 51 and 52, with no similarity with the DMD gene. This corresponded to the partial exonization of a long interspersed nuclear element (LINE-1), disrupting the open reading frame. Further characterization identified a complete LINE-1 (~6 kb with typical hallmarks) deeply inserted in intron 51. Haplotyping and segregation analysis demonstrated that the mutation had a de novo origin. Besides underscoring the importance of mRNA studies in genetically unsolved cases, this is the first report of a disease-causing fully intronic LINE-1 element in DMD, adding to the diversity of mutational events that give rise to D/BMD.


July 19, 2019  |  

Centromere evolution and CpG methylation during vertebrate speciation.

Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20-345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320-350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation.


July 19, 2019  |  

Methylation in Mycobacterium tuberculosis is lineage specific with associated mutations present globally.

DNA methylation is an epigenetic modification of the genome involved in regulating crucial cellular processes, including transcription and chromosome stability. Advances in PacBio sequencing technologies can be used to robustly reveal methylation sites. The methylome of the Mycobacterium tuberculosis complex is poorly understood but may be involved in virulence, hypoxic survival and the emergence of drug resistance. In the most extensive study to date, we characterise the methylome across the 4 major lineages of M. tuberculosis and 2 lineages of M. africanum, the leading causes of tuberculosis disease in humans. We reveal lineage-specific methylated motifs and strain-specific mutations that are abundant globally and likely to explain loss of function in the respective methyltransferases. Our work provides a set of sixteen new complete reference genomes for the Mycobacterium tuberculosis complex, including complete lineage 5 genomes. Insights into lineage-specific methylomes will further elucidate underlying biological mechanisms and other important phenotypes of the epi-genome.


July 19, 2019  |  

Cytogenomic identification and long-read single molecule real-time (SMRT) sequencing of a Bardet-Biedl Syndrome 9 (BBS9) deletion.

Bardet-Biedl syndrome (BBS) is a recessive disorder characterized by heterogeneous clinical manifestations, including truncal obesity, rod-cone dystrophy, renal anomalies, postaxial polydactyly, and variable developmental delays. At least 20 genes have been implicated in BBS, and all are involved in primary cilia function. We report a 1-year-old male child from Guyana with obesity, postaxial polydactyly on his right foot, hypotonia, ophthalmologic abnormalities, and developmental delay, which together indicated a clinical diagnosis of BBS. Clinical chromosomal microarray (CMA) testing and high-throughput BBS gene panel sequencing detected a homozygous 7p14.3 deletion of exons 1-4 of BBS9 that was encompassed by a 17.5?Mb region of homozygosity at chromosome 7p14.2-p21.1. The precise breakpoints of the deletion were delineated to a 72.8?kb region in the proband and carrier parents by third-generation long-read single molecule real-time (SMRT) sequencing (Pacific Biosciences), which suggested non-homologous end joining as a likely mechanism of formation. Long-read SMRT sequencing of the deletion breakpoints also determined that the aberration included the neighboring RP9 gene implicated in retinitis pigmentosa; however, the clinical significance of this was considered uncertain given the paucity of reported cases with unambiguous RP9 mutations. Taken together, our study characterized a BBS9 deletion, and the identification of this shared haplotype in the parents suggests that this pathogenic aberration may be a BBS founder mutation in the Guyanese population. Importantly, this informative case also highlights the utility of long-read SMRT sequencing to map nucleotide breakpoints of clinically relevant structural variants.


July 19, 2019  |  

Single molecule real time sequencing in ADTKD-MUC1 allows complete assembly of the VNTR and exact positioning of causative mutations.

Recently, the Mucin-1 (MUC1) gene has been identified as a causal gene of autosomal dominant tubulointerstitial kidney disease (ADTKD). Most causative mutations are buried within a GC-rich 60 basepair variable number of tandem repeat (VNTR), which escapes identification by massive parallel sequencing methods due to the complexity of the VNTR. We established long read single molecule real time sequencing (SMRT) targeted to the MUC1-VNTR as an alternative strategy to the snapshot assay. Our approach allows complete VNTR assembly, thereby enabling the detection of all variants residing within the VNTR and simultaneous determination of VNTR length. We present high resolution data on the VNTR architecture for a cohort of snapshot positive (n?=?9) and negative (n?=?7) ADTKD families. By SMRT sequencing we could confirm the diagnosis in all previously tested cases, reconstruct both VNTR alleles and determine the exact position of the causative variant in eight of nine families. This study demonstrates that precise positioning of the causative mutation(s) and identification of other coding and noncoding sequence variants in ADTKD-MUC1 is feasible. SMRT sequencing could provide a powerful tool to uncover potential factors encoded within the VNTR that associate with intra- and interfamilial phenotype variability of MUC1 related kidney disease.


July 19, 2019  |  

Piercing the dark matter: bioinformatics of long-range sequencing and mapping.

Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.


July 19, 2019  |  

Accurate detection of complex structural variations using single-molecule sequencing.

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.


July 19, 2019  |  

Utility of DNA, RNA, protein, and functional approaches to solve cryptic immunodeficiencies.

We report a female infant identified by newborn screening for severe combined immunodeficiencies (NBS SCID) with T cell lymphopenia (TCL). The patient had persistently elevated alpha-fetoprotein (AFP) with IgA deficiency, and elevated IgM. Gene sequencing for a SCID panel was uninformative. We sought to determine the cause of the immunodeficiency in this infant.We performed whole-exome sequencing (WES) on the patient and parents to identify a genetic diagnosis. Based on the WES result, we developed a novel flow cytometric panel for rapid assessment of DNA repair defects using blood samples. We also performed whole transcriptome sequencing (WTS) on fibroblast RNA from the patient and father for abnormal transcript analysis.WES revealed a pathogenic paternally inherited indel in ATM. We used the flow panel to assess several proteins in the DNA repair pathway in lymphocyte subsets. The patient had absent phosphorylation of ATM, resulting in absent or aberrant phosphorylation of downstream proteins, including ?H2AX. However, ataxia-telangiectasia (AT) is an autosomal recessive condition, and the abnormal functional data did not correspond with a single ATM variant. WTS revealed in-frame reciprocal fusion transcripts involving ATM and SLC35F2 indicating a chromosome 11 inversion within 11q22.3, of maternal origin. Inversion breakpoints were identified within ATM intron 16 and SLC35F2 intron 7.We identified a novel ATM-breaking chromosome 11 inversion in trans with a pathogenic indel (compound heterozygote) resulting in non-functional ATM protein, consistent with a diagnosis of AT. Utilization of several molecular and functional assays allowed successful resolution of this case.


July 19, 2019  |  

Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma.

To understand how genomic heterogeneity of glioblastoma (GBM) contributes to poor therapy response, we performed DNA and RNA sequencing on GBM samples and the neurospheres and orthotopic xenograft models derived from them. We used the resulting dataset to show that somatic driver alterations including single-nucleotide variants, focal DNA alterations and oncogene amplification on extrachromosomal DNA (ecDNA) elements were in majority propagated from tumor to model systems. In several instances, ecDNAs and chromosomal alterations demonstrated divergent inheritance patterns and clonal selection dynamics during cell culture and xenografting. We infer that ecDNA was unevenly inherited by offspring cells, a characteristic that affects the oncogenic potential of cells with more or fewer ecDNAs. Longitudinal patient tumor profiling found that oncogenic ecDNAs are frequently retained throughout the course of disease. Our analysis shows that extrachromosomal elements allow rapid increase of genomic heterogeneity during GBM evolution, independently of chromosomal DNA alterations.


July 19, 2019  |  

Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes.

Maize is an important crop with a high level of genome diversity and heterosis. The genome sequence of a typical female line, B73, was previously released. Here, we report a de novo genome assembly of a corresponding male representative line, Mo17. More than 96.4% of the 2,183?Mb assembled genome can be accounted for by 362 scaffolds in ten pseudochromosomes with 38,620 annotated protein-coding genes. Comparative analysis revealed large gene-order and gene structural variations: approximately 10% of the annotated genes were mutually nonsyntenic, and more than 20% of the predicted genes had either large-effect mutations or large structural variations, which might cause considerable protein divergence between the two inbred lines. Our study provides a high-quality reference-genome sequence of an important maize germplasm, and the intraspecific gene order and gene structural variations identified should have implications for heterosis and genome evolution.


July 19, 2019  |  

Characterization of a human-specific tandem repeat associated with bipolar disorder and schizophrenia.

Bipolar disorder (BD) and schizophrenia (SCZ) are highly heritable diseases that affect more than 3% of individuals worldwide. Genome-wide association studies have strongly and repeatedly linked risk for both of these neuropsychiatric diseases to a 100 kb interval in the third intron of the human calcium channel gene CACNA1C. However, the causative mutation is not yet known. We have identified a human-specific tandem repeat in this region that is composed of 30 bp units, often repeated hundreds of times. This large tandem repeat is unstable using standard polymerase chain reaction and bacterial cloning techniques, which may have resulted in its incorrect size in the human reference genome. The large 30-mer repeat region is polymorphic in both size and sequence in human populations. Particular sequence variants of the 30-mer are associated with risk status at several flanking single-nucleotide polymorphisms in the third intron of CACNA1C that have previously been linked to BD and SCZ. The tandem repeat arrays function as enhancers that increase reporter gene expression in a human neural progenitor cell line. Different human arrays vary in the magnitude of enhancer activity, and the 30-mer arrays associated with increased psychiatric disease risk status have decreased enhancer activity. Changes in the structure and sequence of these arrays likely contribute to changes in CACNA1C function during human evolution and may modulate neuropsychiatric disease risk in modern human populations. Copyright © 2018. Published by Elsevier Inc.


July 19, 2019  |  

De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data.

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.


July 19, 2019  |  

Mapping the landscape of tandem repeat variability by targeted long read single molecule sequencing in familial X-linked intellectual disability.

The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability.We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability. In male DNA samples, full tandem repeat length sequences were obtained for 88-93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of >?900?bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders.This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.


July 19, 2019  |  

A forward genetic screen reveals a primary role for Plasmodium falciparum Reticulocyte Binding Protein Homologue 2a and 2b in determining alternative erythrocyte invasion pathways.

Invasion of human erythrocytes is essential for Plasmodium falciparum parasite survival and pathogenesis, and is also a complex phenotype. While some later steps in invasion appear to be invariant and essential, the earlier steps of recognition are controlled by a series of redundant, and only partially understood, receptor-ligand interactions. Reverse genetic analysis of laboratory adapted strains has identified multiple genes that when deleted can alter invasion, but how the relative contributions of each gene translate to the phenotypes of clinical isolates is far from clear. We used a forward genetic approach to identify genes responsible for variable erythrocyte invasion by phenotyping the parents and progeny of previously generated experimental genetic crosses. Linkage analysis using whole genome sequencing data revealed a single major locus was responsible for the majority of phenotypic variation in two invasion pathways. This locus contained the PfRh2a and PfRh2b genes, members of one of the major invasion ligand gene families, but not widely thought to play such a prominent role in specifying invasion phenotypes. Variation in invasion pathways was linked to significant differences in PfRh2a and PfRh2b expression between parasite lines, and their role in specifying alternative invasion was confirmed by CRISPR-Cas9-mediated genome editing. Expansion of the analysis to a large set of clinical P. falciparum isolates revealed common deletions, suggesting that variation at this locus is a major cause of invasion phenotypic variation in the endemic setting. This work has implications for blood-stage vaccine development and will help inform the design and location of future large-scale studies of invasion in clinical isolates.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.