Structural variation Archives - Page 18 of 31

July 7, 2019

First complete genome sequence of Clostridium sporogenes DSM 795T, a nontoxigenic surrogate for Clostridium botulinum, determined using PacBio Single-Molecule Real-Time Technology.

The first complete genome sequence of Clostridium sporogenes DSM 795(T), a nontoxigenic surrogate for Clostridium botulinum, was determined in a single contig using the PacBio single-molecule real-time technology. The genome (4,142,990 bp; G+C content, 27.98%) included 86 sets of >1,000-bp identical sequence pairs and 380 tandem repeats. Copyright © 2015 Nakano et al.

July 7, 2019

Scalable multi whole-genome alignment using recursive exact matching

The emergence of third generation sequencing technologies has brought near perfect de-novo genome assembly within reach. This clears the way towards reference-free detection of genomic variations. In this paper, we introduce a novel concept for aligning whole-genomes which allows the alignment of multiple genomes. Alignments are constructed in a recursive manner, in which alignment decisions are statistically supported. Computational performance is achieved by splitting an initial indexing data structure into a multitude of smaller indices. We show that our method can be used to detect high resolution structural variations between two human genomes, and that it can be used to obtain a high quality multiple genome alignment of at least nineteen Mycobacterium tuberculosis genomes. An implementation of the outlined algorithm called REVEAL is available on: https://github.com/jasperlinthorst/REVEAL

July 7, 2019

Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology.In this study, the high error rate of PacBio long sequence reads of A. comosus’s total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus.The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.

July 7, 2019

Complete genome sequence of Achromobacter xylosoxidans MN001, a cystic fibrosis airway isolate.

The genome of Achromobacter xylosoxidans MN001, a strain isolated from sputum derived from an adult cystic fibrosis patient, was sequenced using combined single-molecule real-time and Illumina sequencing. Assembly of the complete genome resulted in a 5,876,039-bp chromosome, representing the smallest A. xylosoxidans genome sequenced to date. Copyright © 2015 Badalamenti and Hunter.

July 7, 2019

An integrated map of structural variation in 2,504 human genomes.

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

July 7, 2019

Hybrid de novo tandem repeat detection using short and long reads.

As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.

July 7, 2019

Scarless genome editing and stable inducible expression vectors for Geobacter sulfurreducens.

Metal reduction by members of the Geobacteraceae is encoded by multiple gene clusters, and the study of extracellular electron transfer often requires biofilm development on surfaces. Genetic tools that utilize polar antibiotic cassette insertions limit mutant construction and complementation. In addition, unstable plasmids create metabolic burdens that slow growth, and the presence of antibiotics such as kanamycin can interfere with the rate and extent of Geobacter biofilm growth. We report here genetic system improvements for the model anaerobic metal-reducing bacterium Geobacter sulfurreducens. A motile strain of G. sulfurreducens was constructed by precise removal of a transposon interrupting the fgrM flagellar regulator gene using SacB/sucrose counterselection, and Fe(III) citrate reduction was eliminated by deletion of the gene encoding the inner membrane cytochrome imcH. We also show that RK2-based plasmids were maintained in G. sulfurreducens for over 15 generations in the absence of antibiotic selection in contrast to unstable pBBR1 plasmids. Therefore, we engineered a series of new RK2 vectors containing native constitutive Geobacter promoters, and modified one of these promoters for VanR-dependent induction by the small aromatic carboxylic acid vanillate. Inducible plasmids fully complemented ?imcH mutants for Fe(III) reduction, Mn(IV) oxide reduction, and growth on poised electrodes. A real-time, high-throughput Fe(III) citrate reduction assay is described that can screen numerous G. sulfurreducens strain constructs simultaneously and shows the sensitivity of imcH expression by the vanillate system. These tools will enable more sophisticated genetic studies in G. sulfurreducens without polar insertion effects or need for multiple antibiotics. Copyright © 2015, Chan et al.

July 7, 2019

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.

July 7, 2019

The Brachypodium distachyon reference genome

Grasses provide the bulk of human calories but improvement in grass yields is hindered by the characteristically large and complex genomes of these species; the genomes of wheat, maize, and sugar cane are 17,000, 2300, and 10,000 Mb, respectively. Brachypodium distachyon has one of the smallest genomes of all grasses at 272 Mb, and a number of key traits that make it a good model grass. Brachypodium was the fourth sequenced grass genome, after rice, Sorghum, and maize, and was the first sequenced in the Pooideae subfamily, a diverse group that includes wheat, barley, oat, and rye. The Brachypodium genome was sequenced using a whole genome shotgun approach with Sanger sequencing and is nearly complete with 99.6 % of the sequences anchored to five chromosomes. Sequencing of Brachypodium enabled comparative genomic analysis of grass genomes and shed light on processes involved in chromosome fusions and maintenance of a small genome. The high-quality Brachypodium genome sequence provides a framework for gene expression atlases, resequencing, quantitative trait loci (QTL) mapping, GWAS, and ENCODE datasets. The wealth of Brachypodium genomic resources have cemented its utility as a model organism and will facilitate translational work for improving the grasses that feed the world.

July 7, 2019

Chromosomal rearrangements as barriers to genetic homogenization between archaic and modern humans.

Chromosomal rearrangements, which shuffle DNA throughout the genome, are an important source of divergence across taxa. Using a paired-end read approach with Illumina sequence data for archaic humans, I identify changes in genome structure that occurred recently in human evolution. Hundreds of rearrangements indicate genomic trafficking between the sex chromosomes and autosomes, raising the possibility of sex-specific changes. Additionally, genes adjacent to genome structure changes in Neanderthals are associated with testis-specific expression, consistent with evolutionary theory that new genes commonly form with expression in the testes. I identify one case of new-gene creation through transposition from the Y chromosome to chromosome 10 that combines the 5′-end of the testis-specific gene Fank1 with previously untranscribed sequence. This new transcript experienced copy number expansion in archaic genomes, indicating rapid genomic change. Among rearrangements identified in Neanderthals, 13% are transposition of selfish genetic elements, whereas 32% appear to be ectopic exchange between repeats. In Denisovan, the pattern is similar but numbers are significantly higher with 18% of rearrangements reflecting transposition and 40% ectopic exchange between distantly related repeats. There is an excess of divergent rearrangements relative to polymorphism in Denisovan, which might result from nonuniform rates of mutation, possibly reflecting a burst of transposable element activity in the lineage that led to Denisovan. Finally, loci containing genome structure changes show diminished rates of introgression from Neanderthals into modern humans, consistent with the hypothesis that rearrangements serve as barriers to gene flow during hybridization. Together, these results suggest that this previously unidentified source of genomic variation has important biological consequences in human evolution. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

svviz: a read viewer for validating structural variants.

Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms.svviz is implemented in python and freely available from http://svviz.github.io/. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

July 7, 2019

Contiguity: Contig adjacency graph construction and visualisation

Contiguity is interactive software for the visualization and manipulation of de novo genome assemblies. 14 Contiguity creates and displays information on contig adjacency which is contextualized by the 15 simultaneous display of a comparison between assembled contigs and reference sequence. Where 16 scaffolders allow unambiguous connections between contigs to be resolved into a single scaffold, 17 Contiguity allows the user to create all potential scaffolds in ambiguous regions of the genome. This 18 enables the resolution of novel sequence or structural variants from the assembly. In addition, 19 Contiguity provides a sequencing and assembly agnostic approach for the creation of contig adjacency 20 graphs. To maximize the number of contig adjacencies determined, Contiguity combines information 21 from read pair mappings, sequence overlap and De Bruijn graph exploration. We demonstrate how 22 highly sensitive graphs can be achieved using this method. Contig adjacency graphs allow the user to 23 visualize potential arrangements of contigs in unresolvable areas of the genome. By combining 24 adjacency information with comparative genomics, Contiguity provides an intuitive approach for 25 exploring and improving sequence assemblies. It is also useful in guiding manual closure of long read 26 sequence assemblies. Contiguity is an open source application, implemented using Python and the 27 Tkinter GUI package that can run on any Unix, OSX and Windows operating system. It has been 28 designed and optimized for bacterial assemblies. Contiguity is available at 29 http://mjsull.github.io/Contiguity .

July 7, 2019

Bordetella pertussis evolution in the (functional) genomics era.

The incidence of whooping cough caused by Bordetella pertussis in many developed countries has risen dramatically in recent years. This has been linked to the use of an acellular pertussis vaccine. In addition, it is thought that B. pertussis is adapting under acellular vaccine mediated immune selection pressure, towards vaccine escape. Genomics-based approaches have revolutionized the ability to resolve the fine structure of the global B. pertussis population and its evolution during the era of vaccination. Here, we discuss the current picture of B. pertussis evolution and diversity in the light of the current resurgence, highlight import questions raised by recent studies in this area and discuss the role that functional genomics can play in addressing current knowledge gaps.© FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Twenty years of bacterial genome sequencing.

Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a ‘sequencing singularity’, where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond.

July 7, 2019

Unique transposon landscapes are pervasive across Drosophila melanogaster genomes.

To understand how transposon landscapes (TLs) vary across animal genomes, we describe a new method called the Transposon Insertion and Depletion AnaLyzer (TIDAL) and a database of >300 TLs in Drosophila melanogaster (TIDAL-Fly). Our analysis reveals pervasive TL diversity across cell lines and fly strains, even for identically named sub-strains from different laboratories such as the ISO1 strain used for the reference genome sequence. On average, >500 novel insertions exist in every lab strain, inbred strains of the Drosophila Genetic Reference Panel (DGRP), and fly isolates in the Drosophila Genome Nexus (DGN). A minority (<25%) of transposon families comprise the majority (>70%) of TL diversity across fly strains. A sharp contrast between insertion and depletion patterns indicates that many transposons are unique to the ISO1 reference genome sequence. Although TL diversity from fly strains reaches asymptotic limits with increasing sequencing depth, rampant TL diversity causes unsaturated detection of TLs in pools of flies. Finally, we show novel transposon insertions negatively correlate with Piwi-interacting RNA (piRNA) levels for most transposon families, except for the highly-abundant roo retrotransposon. Our study provides a useful resource for Drosophila geneticists to understand how transposons create extensive genomic diversity in fly cell lines and strains.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Auto Tag: Structural variation

First complete genome sequence of Clostridium sporogenes DSM 795T, a nontoxigenic surrogate for Clostridium botulinum, determined using PacBio Single-Molecule Real-Time Technology.

Scalable multi whole-genome alignment using recursive exact matching

Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

Complete genome sequence of Achromobacter xylosoxidans MN001, a cystic fibrosis airway isolate.

An integrated map of structural variation in 2,504 human genomes.

Hybrid de novo tandem repeat detection using short and long reads.

Scarless genome editing and stable inducible expression vectors for Geobacter sulfurreducens.

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

The Brachypodium distachyon reference genome

Chromosomal rearrangements as barriers to genetic homogenization between archaic and modern humans.

svviz: a read viewer for validating structural variants.

Contiguity: Contig adjacency graph construction and visualisation

Bordetella pertussis evolution in the (functional) genomics era.

Twenty years of bacterial genome sequencing.

Unique transposon landscapes are pervasive across Drosophila melanogaster genomes.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert