Menu
July 7, 2019

Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences.

To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences.Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies.All assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.brownsd@ornl.govSupplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.


July 7, 2019

Genome sequence of Pseudomonas chlororaphis strain PA23.

Pseudomonas chlororaphis strain PA23 is a plant-beneficial bacterium that is able to suppress disease caused by the fungal pathogen Sclerotinia sclerotiorum through a process known as biological control. Here we present a 7.1-Mb assembly of the PA23 genome. Copyright © 2014 Loewen et al.


July 7, 2019

Genome Sequence of Bacillus pumilus MTCC B6033.

Bacillus pumilus is a Gram-positive, rod-shaped, aerobic bacterium isolated from the soil. B. pumilus strain B6033 was originally selected as a biocatalyst for the stereospecific oxidation of ß-lactams. Here, we present a 3.8-Mb assembly of its genome, which is the second fully assembled genome of a B. pumilus strain.


July 7, 2019

Complete genome sequences of eight Helicobacter pylori strains with different virulence factor genotypes and methylation profiles, isolated from patients with diverse gastrointestinal diseases on Okinawa Island, Japan, determined using PacBio Single-Molecule Real-Time Technology.

We report the complete genome sequences of eight Helicobacter pylori strains isolated from patients with gastrointestinal diseases in Okinawa, Japan. Whole-genome sequencing and DNA methylation detection were performed using the PacBio platform. De novo assembly determined a single, complete contig for each strain. Furthermore, methylation analysis identified virulence factor genotype-dependent motifs.


July 7, 2019

Insights into the preservation of the homomorphic sex-determining chromosome of Aedes aegypti from the discovery of a male-biased gene tightly linked to the M-locus.

The preservation of a homomorphic sex-determining chromosome in some organisms without transformation into a heteromorphic sex chromosome is a long-standing enigma in evolutionary biology. A dominant sex-determining locus (or M-locus) in an undifferentiated homomorphic chromosome confers the male phenotype in the yellow fever mosquito Aedes aegypti. Genetic evidence suggests that the M-locus is in a nonrecombining region. However, the molecular nature of the M-locus has not been characterized. Using a recently developed approach based on Illumina sequencing of male and female genomic DNA, we identified a novel gene, myo-sex, that is present almost exclusively in the male genome but can sporadically be found in the female genome due to recombination. For simplicity, we define sequences that are primarily found in the male genome as male-biased. Fluorescence in situ hybridization (FISH) on A. aegypti chromosomes demonstrated that the myo-sex probe localized to region 1q21, the established location of the M-locus. Myo-sex is a duplicated myosin heavy chain gene that is highly expressed in the pupa and adult male. Myo-sex shares 83% nucleotide identity and 97% amino acid identity with its closest autosomal paralog, consistent with ancient duplication followed by strong purifying selection. Compared with males, myo-sex is expressed at very low levels in the females that acquired it, indicating that myo-sex may be sexually antagonistic. This study establishes a framework to discover male-biased sequences within a homomorphic sex-determining chromosome and offers new insights into the evolutionary forces that have impeded the expansion of the nonrecombining M-locus in A. aegypti.


July 7, 2019

Safety of the surrogate microorganism Enterococcus faecium NRRL B-2354 for use in thermal process validation.

Enterococcus faecium NRRL B-2354 is a surrogate microorganism used in place of pathogens for validation of thermal processing technologies and systems. We evaluated the safety of strain NRRL B-2354 based on its genomic and functional characteristics. The genome of E. faecium NRRL B-2354 was sequenced and found to comprise a 2,635,572-bp chromosome and a 214,319-bp megaplasmid. A total of 2,639 coding sequences were identified, including 45 genes unique to this strain. Hierarchical clustering of the NRRL B-2354 genome with 126 other E. faecium genomes as well as pbp5 locus comparisons and multilocus sequence typing (MLST) showed that the genotype of this strain is most similar to commensal, or community-associated, strains of this species. E. faecium NRRL B-2354 lacks antibiotic resistance genes, and both NRRL B-2354 and its clonal relative ATCC 8459 are sensitive to clinically relevant antibiotics. This organism also lacks, or contains nonfunctional copies of, enterococcal virulence genes including acm, cyl, the ebp operon, esp, gelE, hyl, IS16, and associated phenotypes. It does contain scm, sagA, efaA, and pilA, although either these genes were not expressed or their roles in enterococcal virulence are not well understood. Compared with the clinical strains TX0082 and 1,231,502, E. faecium NRRL B-2354 was more resistant to acidic conditions (pH 2.4) and high temperatures (60°C) and was able to grow in 8% ethanol. These findings support the continued use of E. faecium NRRL B-2354 in thermal process validation of food products.


July 7, 2019

Genome sequence of Ensifer adhaerens OV14 provides insights into its ability as a novel vector for the genetic transformation of plant genomes.

Recently it has been shown that Ensifer adhaerens can be used as a plant transformation technology, transferring genes into several plant genomes when equipped with a Ti plasmid. For this study, we have sequenced the genome of Ensifer adhaerens OV14 (OV14) and compared it with those of Agrobacterium tumefaciens C58 (C58) and Sinorhizobium meliloti 1021 (1021); the latter of which has also demonstrated a capacity to genetically transform crop genomes, albeit at significantly reduced frequencies.The 7.7 Mb OV14 genome comprises two chromosomes and two plasmids. All protein coding regions in the OV14 genome were functionally grouped based on an eggNOG database. No genes homologous to the A. tumefaciens Ti plasmid vir genes appeared to be present in the OV14 genome. Unexpectedly, OV14 and 1021 were found to possess homologs to chromosomal based genes cited as essential to A. tumefaciens T-DNA transfer. Of significance, genes that are non-essential but exert a positive influence on virulence and the ability to genetically transform host genomes were identified in OV14 but were absent from the 1021 genome.This study reveals the presence of homologs to chromosomally based Agrobacterium genes that support T-DNA transfer within the genome of OV14 and other alphaproteobacteria. The sequencing and analysis of the OV14 genome increases our understanding of T-DNA transfer by non-Agrobacterium species and creates a platform for the continued improvement of Ensifer-mediated transformation (EMT).


July 7, 2019

Complete genome of the switchgrass endophyte Enterobacter clocace P101.

The Enterobacter cloacae complex is genetically very diverse. The increasing number of complete genomic sequences of E. cloacae is helping to determine the exact relationship among members of the complex. E. cloacae P101 is an endophyte of switchgrass (Panicum virgatum) and is closely related to other E. cloacae strains isolated from plants. The P101 genome consists of a 5,369,929 bp chromosome. The chromosome has 5,164 protein-coding regions, 100 tRNA sequences, and 8 rRNA operons.


July 7, 2019

Complete genome sequence of the sugar cane endophyte Pseudomonas aurantiaca PB-St2, a disease-suppressive bacterium with antifungal activity toward the plant pathogen Colletotrichum falcatum.

The endophytic bacterium Pseudomonas aurantiaca PB-St2 exhibits antifungal activity and represents a biocontrol agent to suppress red rot disease of sugar cane. Here, we report the completely sequenced 6.6-Mb genome of P. aurantiaca PB-St2. The sequence contains a repertoire of biosynthetic genes for secondary metabolites that putatively contribute to its antagonistic activity and its plant-microbe interactions.


July 7, 2019

Sequence alignment tools: one parallel pattern to rule them all?

In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.


July 7, 2019

Integrative analysis of Salmonellosis in Israel reveals association of Salmonella enterica serovar 9,12:l,v:- with extraintestinal infections, dissemination of endemic S. enterica serovar Typhimurium DT104 biotypes, and severe underreporting of outbreaks.

Salmonella enterica is the leading etiologic agent of bacterial food-borne outbreaks worldwide. This ubiquitous species contains more than 2,600 serovars that may differ in their host specificity, clinical manifestations, and epidemiology. To characterize salmonellosis epidemiology in Israel and to study the association of nontyphoidal Salmonella (NTS) serovars with invasive infections, 48,345 Salmonella cases reported and serotyped at the National Salmonella Reference Center between 1995 and 2012 were analyzed. A quasi-Poisson regression was used to identify irregular clusters of illness, and pulsed-field gel electrophoresis in conjunction with whole-genome sequencing was applied to molecularly characterize strains of interest. Three hundred twenty-nine human salmonellosis clusters were identified, representing an annual average of 23 (95% confidence interval [CI], 20 to 26) potential outbreaks. We show that the previously unsequenced S. enterica serovar 9,12:l,v:- belongs to the B clade of Salmonella enterica subspecies enterica, and we show its frequent association with extraintestinal infections, compared to other NTS serovars. Furthermore, we identified the dissemination of two prevalent Salmonella enterica serovar Typhimurium DT104 clones in Israel, which are genetically distinct from other global DT104 isolates. Accumulatively, these findings indicate a severe underreporting of Salmonella outbreaks in Israel and provide insights into the epidemiology and genomics of prevalent serovars, responsible for recurring illness. Copyright © 2014, American Society for Microbiology. All Rights Reserved.


July 7, 2019

Whole-genome analysis of Exserohilum rostratum from an outbreak of fungal meningitis and other infections.

Exserohilum rostratum was the cause of most cases of fungal meningitis and other infections associated with the injection of contaminated methylprednisolone acetate produced by the New England Compounding Center (NECC). Until this outbreak, very few human cases of Exserohilum infection had been reported, and very little was known about this dematiaceous fungus, which usually infects plants. Here, we report using whole-genome sequencing (WGS) for the detection of single nucleotide polymorphisms (SNPs) and phylogenetic analysis to investigate the molecular origin of the outbreak using 22 isolates of E. rostratum retrieved from 19 case patients with meningitis or epidural/spinal abscesses, 6 isolates from contaminated NECC vials, and 7 isolates unrelated to the outbreak. Our analysis indicates that all 28 isolates associated with the outbreak had nearly identical genomes of 33.8 Mb. A total of 8 SNPs were detected among the outbreak genomes, with no more than 2 SNPs separating any 2 of the 28 genomes. The outbreak genomes were separated from the next most closely related control strain by ~136,000 SNPs. We also observed significant genomic variability among strains unrelated to the outbreak, which may suggest the possibility of cryptic speciation in E. rostratum. Copyright © 2014, American Society for Microbiology. All Rights Reserved.


July 7, 2019

De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.© 2014 American Society of Plant Biologists. All Rights Reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.