Menu
July 7, 2019

Automated ensemble assembly and validation of microbial genomes.

The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible.To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers.Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.


July 7, 2019

Complete sequences of organelle genomes from the medicinal plant Rhazya stricta (Apocynaceae) and contrasting patterns of mitochondrial genome evolution across asterids.

Rhazya stricta is native to arid regions in South Asia and the Middle East and is used extensively in folk medicine to treat a wide range of diseases. In addition to generating genomic resources for this medicinally important plant, analyses of the complete plastid and mitochondrial genomes and a nuclear transcriptome from Rhazya provide insights into inter-compartmental transfers between genomes and the patterns of evolution among eight asterid mitochondrial genomes.The 154,841 bp plastid genome is highly conserved with gene content and order identical to the ancestral organization of angiosperms. The 548,608 bp mitochondrial genome exhibits a number of phenomena including the presence of recombinogenic repeats that generate a multipartite organization, transferred DNA from the plastid and nuclear genomes, and bidirectional DNA transfers between the mitochondrion and the nucleus. The mitochondrial genes sdh3 and rps14 have been transferred to the nucleus and have acquired targeting presequences. In the case of rps14, two copies are present in the nucleus; only one has a mitochondrial targeting presequence and may be functional. Phylogenetic analyses of both nuclear and mitochondrial copies of rps14 across angiosperms suggests Rhazya has experienced a single transfer of this gene to the nucleus, followed by a duplication event. Furthermore, the phylogenetic distribution of gene losses and the high level of sequence divergence in targeting presequences suggest multiple, independent transfers of both sdh3 and rps14 across asterids. Comparative analyses of mitochondrial genomes of eight sequenced asterids indicates a complicated evolutionary history in this large angiosperm clade with considerable diversity in genome organization and size, repeat, gene and intron content, and amount of foreign DNA from the plastid and nuclear genomes.Organelle genomes of Rhazya stricta provide valuable information for improving the understanding of mitochondrial genome evolution among angiosperms. The genomic data have enabled a rigorous examination of the gene transfer events. Rhazya is unique among the eight sequenced asterids in the types of events that have shaped the evolution of its mitochondrial genome. Furthermore, the organelle genomes of R. stricta provide valuable genomic resources for utilizing this important medicinal plant in biotechnology applications.


July 7, 2019

Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences.

To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences.Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies.All assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.brownsd@ornl.govSupplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.


July 7, 2019

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.


July 7, 2019

Genome sequence of Candidatus Nitrososphaera evergladensis from group I.1b enriched from Everglades soil reveals novel genomic features of the ammonia-oxidizing archaea.

The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group.


July 7, 2019

First genome sequences of Achromobacter phages reveal new members of the N4 family.

Multi-resistant Achromobacter xylosoxidans has been recognized as an emerging pathogen causing nosocomially acquired infections during the last years. Phages as natural opponents could be an alternative to fight such infections. Bacteriophages against this opportunistic pathogen were isolated in a recent study. This study shows a molecular analysis of two podoviruses and reveals first insights into the genomic structure of Achromobacter phages so far.Growth curve experiments and adsorption kinetics were performed for both phages. Adsorption and propagation in cells were visualized by electron microscopy. Both phage genomes were sequenced with the PacBio RS II system based on single molecule, real-time (SMRT) technology and annotated with several bioinformatic tools. To further elucidate the evolutionary relationships between the phage genomes, a phylogenomic analysis was conducted using the genome Blast Distance Phylogeny approach (GBDP).In this study, we present the first detailed analysis of genome sequences of two Achromobacter phages so far. Phages JWAlpha and JWDelta were isolated from two different waste water treatment plants in Germany. Both phages belong to the Podoviridae and contain linear, double-stranded DNA with a length of 72329 bp and 73659 bp, respectively. 92 and 89 putative open reading frames were identified for JWAlpha and JWDelta, respectively, by bioinformatic analysis with several tools. The genomes have nearly the same organization and could be divided into different clusters for transcription, replication, host interaction, head and tail structure and lysis. Detailed annotation via protein comparisons with BLASTP revealed strong similarities to N4-like phages.Analysis of the genomes of Achromobacter phages JWAlpha and JWDelta and comparisons of different gene clusters with other phages revealed that they might be strongly related to other N4-like phages, especially of the Escherichia group. Although all these phages show a highly conserved genomic structure and partially strong similarities at the amino acid level, some differences could be identified. Those differences, e.g. the existence of specific genes for replication or host interaction in some N4-like phages, seem to be interesting targets for further examination of function and specific mechanisms, which might enlighten the mechanism of phage establishment in the host cell after infection.


July 7, 2019

Complete genome sequence of Enterococcus mundtii QU 25, an efficient L-(+)-lactic acid-producing bacterium.

Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified-one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects.Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Insights into the preservation of the homomorphic sex-determining chromosome of Aedes aegypti from the discovery of a male-biased gene tightly linked to the M-locus.

The preservation of a homomorphic sex-determining chromosome in some organisms without transformation into a heteromorphic sex chromosome is a long-standing enigma in evolutionary biology. A dominant sex-determining locus (or M-locus) in an undifferentiated homomorphic chromosome confers the male phenotype in the yellow fever mosquito Aedes aegypti. Genetic evidence suggests that the M-locus is in a nonrecombining region. However, the molecular nature of the M-locus has not been characterized. Using a recently developed approach based on Illumina sequencing of male and female genomic DNA, we identified a novel gene, myo-sex, that is present almost exclusively in the male genome but can sporadically be found in the female genome due to recombination. For simplicity, we define sequences that are primarily found in the male genome as male-biased. Fluorescence in situ hybridization (FISH) on A. aegypti chromosomes demonstrated that the myo-sex probe localized to region 1q21, the established location of the M-locus. Myo-sex is a duplicated myosin heavy chain gene that is highly expressed in the pupa and adult male. Myo-sex shares 83% nucleotide identity and 97% amino acid identity with its closest autosomal paralog, consistent with ancient duplication followed by strong purifying selection. Compared with males, myo-sex is expressed at very low levels in the females that acquired it, indicating that myo-sex may be sexually antagonistic. This study establishes a framework to discover male-biased sequences within a homomorphic sex-determining chromosome and offers new insights into the evolutionary forces that have impeded the expansion of the nonrecombining M-locus in A. aegypti.


July 7, 2019

Characterization of biological pathways associated with a 1.37 Mbp genomic region protective of hypertension in Dahl S rats.

The goal of the present study was to narrow a region of chromosome 13 to only several genes and then apply unbiased statistical approaches to identify molecular networks and biological pathways relevant to blood-pressure salt sensitivity in Dahl salt-sensitive (SS) rats. The analysis of 13 overlapping subcongenic strains identified a 1.37 Mbp region on chromosome 13 that influenced the mean arterial blood pressure by at least 25 mmHg in SS rats fed a high-salt diet. DNA sequencing and analysis filled genomic gaps and provided identification of five genes in this region, Rfwd2, Fam5b, Astn1, Pappa2, and Tnr. A cross-platform normalization of transcriptome data sets obtained from our previously published Affymetrix GeneChip dataset and newly acquired RNA-seq data from renal outer medullary tissue provided 90 observations for each gene. Two Bayesian methods were used to analyze the data: 1) a linear model analysis to assess 243 biological pathways for their likelihood to discriminate blood pressure levels across experimental groups and 2) a Bayesian graphical modeling of pathways to discover genes with potential relationships to the candidate genes in this region. As none of these five genes are known to be involved in hypertension, this unbiased approach has provided useful clues to be experimentally explored. Of these five genes, Rfwd2, the gene most strongly expressed in the renal outer medulla, was notably associated with pathways that can affect blood pressure via renal transcellular Na(+) and K(+) electrochemical gradients and tubular Na(+) transport, mitochondrial TCA cycle and cell energetics, and circadian rhythms. Copyright © 2014 the American Physiological Society.


July 7, 2019

Genome sequence of the chromate-resistant bacterium Leucobacter salsicius type strain M1-8(T.).

Leucobacter salsicius M1-8(T) is a member of the Microbacteriaceae family within the class Actinomycetales. This strain is a Gram-positive, rod-shaped bacterium and was previously isolated from a Korean fermented food. Most members of the genus Leucobacter are chromate-resistant and this feature could be exploited in biotechnological applications. However, the genus Leucobacter is poorly characterized at the genome level, despite its potential importance. Thus, the present study determined the features of Leucobacter salsicius M1-8(T), as well as its genome sequence and annotation. The genome comprised 3,185,418 bp with a G+C content of 64.5%, which included 2,865 protein-coding genes and 68 RNA genes. This strain possessed two predicted genes associated with chromate resistance, which might facilitate its growth in heavy metal-rich environments.


July 7, 2019

The complete genome sequence of Clostridium indolis DSM 755(T.).

Clostridium indolis DSM 755(T) is a bacterium commonly found in soils and the feces of birds and mammals. Despite its prevalence, little is known about the ecology or physiology of this species. However, close relatives, C. saccharolyticum and C. hathewayi, have demonstrated interesting metabolic potentials related to plant degradation and human health. The genome of C. indolis DSM 755(T) reveals an abundance of genes in functional groups associated with the transport and utilization of carbohydrates, as well as citrate, lactate, and aromatics. Ecologically relevant gene clusters related to nitrogen fixation and a unique type of bacterial microcompartment, the CoAT BMC, are also detected. Our genome analysis suggests hypotheses to be tested in future culture based work to better understand the physiology of this poorly described species.


July 7, 2019

LUMPY: a probabilistic framework for structural variant discovery.

Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.