Menu
July 19, 2019

SplitThreader: Exploration and analysis of rearrangements in cancer genomes

Genomic rearrangements and associated copy number changes are important drivers in cancer as they can alter the expression of oncogenes and tumor suppressors, create gene fusions, and misregulate gene expression. Here we present SplitThreader (http://splitthreader.com), an open- source interactive web application for analysis and visualization of genomic rearrangements and copy number variation in cancer genomes. SplitThreader constructs a sequence graph of genomic rearrangements in the sample and uses a priority queue breadth-first search algorithm on the graph to search for novel interactions. This is applied to detect gene fusions and other novel sequences, as well as to evaluate distances in the rearranged genome between any genomic regions of interest, especially the repositioning of regulatory elements and their target genes. SplitThreader also analyzes each variant to categorize it by its relation to other variants and by its copy number concordance. This identifies balanced translocations, identifies simple and complex variants, and suggests likely false positives when copy number is not concordant across a candidate breakpoint. It also provides explanations when multiple variants affect the copy number state and obscure the contribution of a single variant, such as a deletion within a region that is overall amplified. Together, these categories triage the variants into groups and provide a starting point for further systematic analysis and manual curation. To demonstrate its utility, we apply SplitThreader to three cancer cell lines, MCF-7 and A549 with Illumina paired- end sequencing, and SK-BR-3, with long-read PacBio sequencing. Using SplitThreader, we examine the genomic rearrangements responsible for previously observed gene fusions in SK-BR-3 and MCF-7, and discover many of the fusions involved a complex series of multiple genomic rearrangements. We also find notable differences in the types of variants between the three cell lines, in particular a much higher proportion of reciprocal variants in SK-BR-3 and a distinct clustering of interchromosomal variants in SK-BR-3 and MCF-7 that is absent in A549.


July 19, 2019

Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters.

The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications.The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.


July 19, 2019

Parkinson’s disease associated with pure ATXN10 repeat

Large, non-coding pentanucleotide repeat expansions of ATTCT in intron 9 of the ATXN10 gene typically cause progressive spinocerebellar ataxia with or without seizures and present neuropathologically with Purkinje cell loss resulting in symmetrical cerebellar atrophy. These ATXN10 repeat expansions can be interrupted by sequence motifs which have been attributed to seizures and are likely to act as genetic modifiers. We identified a Mexican kindred with multiple affected family members with ATXN10 expansions. Four affected family members showed clinical features of spinocerebellar ataxia type 10 (SCA10). However, one affected individual presented with early-onset levodopa-responsive parkinsonism, and one family member carried a large repeat ATXN10 expansion, but was clinically unaffected. To characterize the ATXN10 repeat, we used a novel technology of single-molecule real-time (SMRT) sequencing and CRISPR/Cas9-based capture. We sequenced the entire span of ~5.3–7.0kb repeat expansions. The Parkinson’s patient carried an ATXN10 expansion with no repeat interruption motifs as well as an unaffected sister. In the siblings with typical SCA10, we found a repeat pattern of ATTCC repeat motifs that have not been associated with seizures previously. Our data suggest that the absence of repeat interruptions is likely a genetic modifier for the clinical presentation of L-Dopa responsive parkinsonism, whereas repeat interruption motifs contribute clinically to epilepsy. Repeat interruptions are important genetic modifiers of the clinical phenotype in SCA10. Advanced sequencing techniques now allow to better characterize the underlying genetic architecture for determining accurate phenotype–genotype correlations.


July 19, 2019

Omics approaches to study gene regulatory networks for development in echinoderms.

Gene regulatory networks (GRNs) describe the interactions for a developmental process at a given time and space. Historically, perturbation experiments represent one of the key methods for analyzing and reconstructing a GRN, and the GRN governing early development in the sea urchin embryo stands as one of the more deeply dissected so far. As technology progresses, so do the methods used to address different biological questions. Next-generation sequencing (NGS) has become a standard experimental technique for genome and transcriptome sequencing and studies of protein-DNA interactions and DNA accessibility. While several efforts have been made toward the integration of different omics approaches for the study of the regulatory genome in many animals, in a few cases, these are applied with the purpose of reconstructing and experimentally testing developmental GRNs. Here, we review emerging approaches integrating multiple NGS technologies for the prediction and validation of gene interactions within echinoderm GRNs. These approaches can be applied to both ‘model’ and ‘non-model’ organisms. Although a number of issues still need to be addressed, advances in NGS applications, such as assay for transposase-accessible chromatin sequencing, combined with the availability of embryos belonging to different species, all separated by various evolutionary distances and accessible to experimental regulatory biology, place echinoderms in an unprecedented position for the reconstruction and evolutionary comparison of developmental GRNs. We conclude that sequencing technologies and integrated omics approaches allow the examination of GRNs on a genome-wide scale only if biological perturbation and cis-regulatory analyses are experimentally accessible, as in the case of echinoderm embryos.© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.


July 19, 2019

A novel approach using long-read sequencing and ddPCR to investigate gonadal mosaicism and estimate recurrence risk in two families with developmental disorders.

De novo mutations contribute significantly to severe early-onset genetic disorders. Even if the mutation is apparently de novo, there is a recurrence risk due to parental germ line mosaicism, depending on in which gonadal generation the mutation occurred.We demonstrate the power of using SMRT sequencing and ddPCR to determine parental origin and allele frequencies of de novo mutations in germ cells in two families whom had undergone assisted reproduction.In the first family, a TCOF1 variant c.3156C>T was identified in the proband with Treacher Collins syndrome. The variant affects splicing and was determined to be of paternal origin. It was present in <1% of the paternal germ cells, suggesting a very low recurrence risk. In the second family, the couple had undergone several unsuccessful pregnancies where a de novo mutation PTPN11 c.923A>C causing Noonan syndrome was identified. The variant was present in 40% of the paternal germ cells suggesting a high recurrence risk.Our findings highlight a successful strategy to identify the parental origin of mutations and to investigate the recurrence risk in couples that have undergone assisted reproduction with an unknown donor or in couples with gonadal mosaicism that will undergo preimplantation genetic diagnosis.© 2017 The Authors Prenatal Diagnosis published by John Wiley & Sons Ltd.


July 19, 2019

Exonization of an intronic LINE-1 element causing Becker muscular dystrophy as a novel mutational mechanism in dystrophin gene.

A broad mutational spectrum in the dystrophin (DMD) gene, from large deletions/duplications to point mutations, causes Duchenne/Becker muscular dystrophy (D/BMD). Comprehensive genotyping is particularly relevant considering the mutation-centered therapies for dystrophinopathies. We report the genetic characterization of a patient with disease onset at age 13 years, elevated creatine kinase levels and reduced dystrophin labeling, where multiplex-ligation probe amplification (MLPA) and genomic sequencing failed to detect pathogenic variants. Bioinformatic, transcriptomic (real time PCR, RT-PCR), and genomic approaches (Southern blot, long-range PCR, and single molecule real-time sequencing) were used to characterize the mutation. An aberrant transcript was identified, containing a 103-nucleotide insertion between exons 51 and 52, with no similarity with the DMD gene. This corresponded to the partial exonization of a long interspersed nuclear element (LINE-1), disrupting the open reading frame. Further characterization identified a complete LINE-1 (~6 kb with typical hallmarks) deeply inserted in intron 51. Haplotyping and segregation analysis demonstrated that the mutation had a de novo origin. Besides underscoring the importance of mRNA studies in genetically unsolved cases, this is the first report of a disease-causing fully intronic LINE-1 element in DMD, adding to the diversity of mutational events that give rise to D/BMD.


July 19, 2019

Functional Analysis of the Glucan Degradation Locus in Caldicellulosiruptor bescii Reveals Essential Roles of Component Glycoside Hydrolases in Plant Biomass Deconstruction.

The ability to hydrolyze microcrystalline cellulose is an uncommon feature in the microbial world, but it can be exploited for conversion of lignocellulosic feedstocks into biobased fuels and chemicals. Understanding the physiological and biochemical mechanisms by which microorganisms deconstruct cellulosic material is key to achieving this objective. The glucan degradation locus (GDL) in the genomes of extremely thermophilic Caldicellulosiruptor species encodes polysaccharide lyases (PLs), unique cellulose binding proteins (tapirins), and putative posttranslational modifying enzymes, in addition to multidomain, multifunctional glycoside hydrolases (GHs), thereby representing an alternative paradigm for plant biomass degradation compared to fungal or cellulosomal systems. To examine the individual and collective in vivo roles of the glycolytic enzymes, the six GH genes in the GDL of Caldicellulosiruptor bescii were systematically deleted, and the extents to which the resulting mutant strains could solubilize microcrystalline cellulose (Avicel) and plant biomass (switchgrass or poplar) were examined. Three of the GDL enzymes, Athe_1867 (CelA) (GH9-CBM3-CBM3-CBM3-GH48), Athe_1859 (GH5-CBM3-CBM3-GH44), and Athe_1857 (GH10-CBM3-CBM3-GH48), acted synergistically in vivo and accounted for 92% of naked microcrystalline cellulose (Avicel) degradation. However, the relative importance of the GDL GHs varied for the plant biomass substrates tested. Furthermore, mixed cultures of mutant strains showed that switchgrass solubilization depended on the secretome-bound enzymes collectively produced by the culture, not on the specific strain from which they came. These results demonstrate that certain GDL GHs are primarily responsible for the degradation of microcrystalline cellulose-containing substrates by C. bescii and provide new insights into the workings of a novel microbial mechanism for lignocellulose utilization.IMPORTANCE The efficient and extensive degradation of complex polysaccharides in lignocellulosic biomass, particularly microcrystalline cellulose, remains a major barrier to its use as a renewable feedstock for the production of fuels and chemicals. Extremely thermophilic bacteria from the genus Caldicellulosiruptor rapidly degrade plant biomass to fermentable sugars at temperatures of 70 to 78°C, although the specific mechanism by which this occurs is not clear. Previous comparative genomic studies identified a genomic locus found only in certain Caldicellulosiruptor species that was hypothesized to be mainly responsible for microcrystalline cellulose degradation. By systematically deleting genes in this locus in Caldicellulosiruptor bescii, the nuanced, substrate-specific in vivo roles of glycolytic enzymes in deconstructing crystalline cellulose and plant biomasses could be discerned. The results here point to synergism of three multidomain cellulases in C. bescii, working in conjunction with the aggregate secreted enzyme inventory, as the key to the plant biomass degradation ability of this extreme thermophile. Copyright © 2017 American Society for Microbiology.


July 19, 2019

Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT Sequencing of repeat-expansion disease causative genomic regions

Targeted sequencing has proven to be an economical means of obtaining sequence information for one or more defined regions of a larger genome. However, most target enrichment methods require amplification. Some genomic regions, such as those with extreme GC content and repetitive sequences, are recalcitrant to faithful amplification. Yet, many human genetic disorders are caused by repeat expansions, including difficult to sequence tandem repeats. We have developed a novel, amplification-free enrichment technique that employs the CRISPR-Cas9 system for specific targeting multiple genomic loci. This method, in conjunction with long reads generated through Single Molecule, Real-Time (SMRT) sequencing and unbiased coverage, enables enrichment and sequencing of complex genomic regions that cannot be investigated with other technologies. Using human genomic DNA samples, we demonstrate successful targeting of causative loci for Huntingtontextquoterights disease (HTT; CAG repeat), Fragile X syndrome (FMR1; CGG repeat), amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (C9orf72; GGGGCC repeat), and spinocerebellar ataxia type 10 (SCA10) (ATXN10; variable ATTCT repeat). The method, amenable to multiplexing across multiple genomic loci, uses an amplification-free approach that facilitates the isolation of hundreds of individual on-target molecules in a single SMRT Cell and accurate sequencing through long repeat stretches, regardless of extreme GC percent or sequence complexity content. Our novel targeted sequencing method opens new doors to genomic analyses independent of PCR amplification that will facilitate the study of repeat expansion disorders.


July 19, 2019

ALF: a strategy for identification of unauthorized GMOs in complex mixtures by a GW-NGS method and dedicated bioinformatics analysis.

The majority of feed products in industrialised countries contains materials derived from genetically modified organisms (GMOs). In parallel, the number of reports of unauthorised GMOs (UGMOs) is gradually increasing. There is a lack of specific detection methods for UGMOs, due to the absence of detailed sequence information and reference materials. In this research, an adapted genome walking approach was developed, called ALF: Amplification of Linearly-enriched Fragments. Coupling of ALF to NGS aims for simultaneous detection and identification of all GMOs, including UGMOs, in one sample, in a single analysis. The ALF approach was assessed on a mixture made of DNA extracts from four reference materials, in an uneven distribution, mimicking a real life situation. The complete insert and genomic flanking regions were known for three of the included GMO events, while for MON15985 only partial sequence information was available. Combined with a known organisation of elements, this GMO served as a model for a UGMO. We successfully identified sequences matching with this organisation of elements serving as proof of principle for ALF as new UGMO detection strategy. Additionally, this study provides a first outline of an automated, web-based analysis pipeline for identification of UGMOs containing known GM elements.


July 19, 2019

De novo assembly of genomes from long sequence reads reveals uncharted territories of Propionibacterium freudenreichii.

Propionibacterium freudenreichii is an industrially important bacterium granted the Generally Recognized as Safe (the GRAS) status, due to its long safe use in food bioprocesses. Despite the recognized role in the food industry and in the production of vitamin B12, as well as its documented health-promoting potential, P. freudenreichii remained poorly characterised at the genomic level. At present, only three complete genome sequences are available for the species.We used the PacBio RS II sequencing platform to generate complete genomes of 20 P. freudenreichii strains and compared them in detail. Comparative analyses revealed both sequence conservation and genome organisational diversity among the strains. Assembly from long reads resulted in the discovery of additional circular elements: two putative conjugative plasmids and three active, lysogenic bacteriophages. It also permitted characterisation of the CRISPR-Cas systems. The use of the PacBio sequencing platform allowed identification of DNA modifications, which in turn allowed characterisation of the restriction-modification systems together with their recognition motifs. The observed genomic differences suggested strain variation in surface piliation and specific mucus binding, which were validated by experimental studies. The phenotypic characterisation displayed large diversity between the strains in ability to utilise a range of carbohydrates, to grow at unfavourable conditions and to form a biofilm.The complete genome sequencing allowed detailed characterisation of the industrially important species, P. freudenreichii by facilitating the discovery of previously unknown features. The results presented here lay a solid foundation for future genetic and functional genomic investigations of this actinobacterial species.


July 19, 2019

Genome sequence of the progenitor of the wheat D genome Aegilops tauschii.

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.


July 19, 2019

The diversity, structure, and function of heritable adaptive immunity sequences in the Aedes aegypti genome.

The Aedes aegypti mosquito transmits arboviruses, including dengue, chikungunya, and Zika virus. Understanding the mechanisms underlying mosquito immunity could provide new tools to control arbovirus spread. Insects exploit two different RNAi pathways to combat viral and transposon infection: short interfering RNAs (siRNAs) and PIWI-interacting RNAs (piRNAs) [1, 2]. Endogenous viral elements (EVEs) are sequences from non-retroviral viruses that are inserted into the mosquito genome and can act as templates for the production of piRNAs [3, 4]. EVEs therefore represent a record of past infections and a reservoir of potential immune memory [5]. The large-scale organization of EVEs has been difficult to resolve with short-read sequencing because they tend to integrate into repetitive regions of the genome. To define the diversity, organization, and function of EVEs, we took advantage of the contiguity associated with long-read sequencing to generate a high-quality assembly of the Ae. aegypti-derived Aag2 cell line genome, an important and widely used model system. We show EVEs are acquired through recombination with specific classes of long terminal repeat (LTR) retrotransposons and organize into large loci (>50 kbp) characterized by high LTR density. These EVE-containing loci have increased density of piRNAs compared to similar regions without EVEs. Furthermore, we detected EVE-derived piRNAs consistent with a targeted processing of persistently infecting virus genomes. We propose that comparisons of EVEs across mosquito populations may explain differences in vector competence, and further study of the structure and function of these elements in the genome of mosquitoes may lead to epidemiological interventions. Copyright © 2017 Elsevier Ltd. All rights reserved.


July 19, 2019

Analysis of recombinational switching at the antigenic variation locus of the Lyme spirochete using a novel PacBio sequencing pipeline.

The Lyme disease spirochete evades the host immune system by combinatorial variation of VlsE, a surface antigen. Antigenic variation occurs via segmental gene conversion from contiguous silent cassettes into the vlsE locus. Because of the high degree of similarity between switch variants and the size of vlsE, short-read NGS technologies have been unsuitable for sequencing vlsE populations. Here we use PacBio sequencing technology coupled with the first fully-automated software pipeline (VAST) to accurately process NGS data by minimizing error frequency, eliminating heteroduplex errors and accurately aligning switch variants. We extend earlier studies by showing use of almost all of the vlsE SNP repertoire. In different tissues of the same mouse, 99.6% of the variants were unique, suggesting that dissemination of Borrelia burgdorferi is predominantly unidirectional with little tissue-to-tissue hematogenous dissemination. We also observed a similar number of variants in SCID and wild-type mice, a heatmap of location and frequency of amino acid changes on the 3D structure and note differences observed in SCID versus wild type mice that hint at possible amino acid function. Our observed selection against diversification of residues at the dimer interface in wild-type mice strongly suggests that dimerization is required for in vivo functionality of vlsE.© 2017 John Wiley & Sons Ltd.


July 19, 2019

Genome and methylome variation in Helicobacter pylori with a cag pathogenicity island during early stages of human infection.

Helicobacter pylori is remarkable for its genetic variation. Yet little isknown about its genetic changes during early stages of human infection, as the bacteria adapt to their new environment. We analyzed genome and methylome variations in a fully virulent strain of H pylori strain during experimental infection.We performed a randomized Phase 1 and 2, observer-blind, placebo-controlled, study of 12 healthy, H pylori-negative adults in Germany from October 2008 through March 2010. The volunteers were given a prophylactic vaccine candidate (n=7) or placebo (n=5) and then challenged with H pylori strain BCM-300. Biopsy samples were collected and H pylori were isolated. Genomes of the challenge strain and 12 re-isolates, obtained 12 weeks after (or in 1 case, 62 weeks after) infection were sequenced by single-molecule, real-time technology, which, in parallel, permitted determination of genome-wide methylation patterns for all strains. Functional effects of genetic changes observed in H pylori strains during human infection were assessed by measuring release of interleukin 8 from AGS cells (to detect cag PAI function), neutral red uptake (to detect vacuolating cytotoxin activity), and adhesion assays.The observed mutation rate was in agreement with rates previously determined from patients with chronic H pylori infections, without evidence of a mutation burst. A loss; of cag PAI function was observed in 3 re-isolates. In addition, 3 re-isolates from the vaccine; group acquired mutations in the vacuolating cytotoxin gene vacA, resulting in loss of; vacuolization activity from gastric epithelial cells. We observed inter-strain variation in; methylomes due to phase variation in genes encoding methyltransferases.We analyzed adaptation of a fully virulent strain of H pylori to 12 differentvolunteers to obtain a robust estimate of the frequency of genetic and epigenetic changes inthe absence of inter-strain recombination. Our findings indicate that the large amount of; genetic variation in H pylori poses a challenge to vaccine development. ClinicalTrials.gov no: NCT00736476. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.


July 19, 2019

Single molecule real-time (SMRT®) DNA sequencing of HLA genes at ultra-high resolution from 126 International HLA and Immunogenetics Workshop cell lines.

The hyperpolymorphic HLA genes play important roles in disease and transplantation and act as genetic markers of migration and evolution. A panel of 107 B-lymphoblastoid cell lines (B-LCLs) was established in 1987 at the 10th International Histocompatibility Workshop as a resource for the immunogenetics community. These B-LCLs are well characterised and represent diverse ethnicities and HLA haplotypes. Here we have applied Pacific Biosciences’ Single Molecule Real-Time (SMRT) DNA sequencing to HLA type 126 B-LCL, including the 107 IHIW cells, to ultra-high resolution. Amplicon sequencing of full-length HLA class I genes (HLA-A, -B and -C) and partial length HLA class II genes (HLA-DRB1, -DQB1 and -DPB1) was performed. We typed a total of 931 HLA alleles, 895 (96%) of which were consistent with the typing in the IPD-IMGT/HLA Database (Release 3.27.0, 2017-01-20), with 595 (64%) typed at a higher resolution. Discrepant types, including novel alleles (n=10) and changes in zygosity (n=13), as well as previously unreported types (n=34) were observed. In addition, patterns of linkage disequilibrium were distinguished by four-field resolution typing of HLA-B and HLA-C. By improving and standardising the HLA typing of these B-LCLs, we have ensured their continued usefulness as a resource for the immunogenetics community in the age of next generation DNA sequencing.This article is protected by copyright. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.