Menu
July 7, 2019

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10?kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads.We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9?min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools.https://github.com/lh3/minimap and https://github.com/lh3/miniasmhengli@broadinstitute.orgSupplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Whole genome DNA sequence analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS) to Salmonella subspecies enterica serotype Tennessee (S. Tennessee) to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana), which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP) analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs), suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future outbreaks. Using WGS can delimit contamination sources for foodborne illnesses across multiple outbreaks and reveal otherwise undetected DNA sequence differences essential to the tracing of bacterial pathogens as they emerge.


July 7, 2019

Accelerated dysbiosis of gut microbiota during aggravation of DSS-induced colitis by a butyrate-producing bacterium.

Butyrate-producing bacteria (BPB) are potential probiotic candidates for inflammatory bowel diseases as they are often depleted in the diseased gut microbiota. However, here we found that augmentation of a human-derived butyrate-producing strain, Anaerostipes hadrus BPB5, significantly aggravated colitis in dextran sulphate sodium (DSS)-treated mice while exerted no detrimental effect in healthy mice. We explored how the interaction between BPB5 and gut microbiota may contribute to this differential impact on the hosts. Butyrate production and severity of colitis were assessed in both healthy and DSS-treated mice, and gut microbiota structural changes were analysed using high-throughput sequencing. BPB5-inoculated healthy mice showed no signs of colitis, but increased butyrate content in the gut. In DSS-treated mice, BPB5 augmentation did not increase butyrate content, but induced significantly more severe disease activity index and much higher mortality. BPB5 didn’t induce significant changes of gut microbiota in healthy hosts, but expedited the structural shifts 3 days earlier toward the disease phase in BPB5-augmented than DSS-treated animals. The differential response of gut microbiota in healthy and DSS-treated mice to the same potentially beneficial bacterium with drastically different health consequences suggest that animals with dysbiotic gut microbiota should also be employed for the safety assessment of probiotic candidates.


July 7, 2019

Complete genomes of Bacillus coagulans S-lac and Bacillus subtilis TO-A JPC, two phylogenetically distinct probiotics.

Several spore-forming strains of Bacillus are marketed as probiotics due to their ability to survive harsh gastrointestinal conditions and confer health benefits to the host. We report the complete genomes of two commercially available probiotics, Bacillus coagulans S-lac and Bacillus subtilis TO-A JPC, and compare them with the genomes of other Bacillus and Lactobacillus. The taxonomic position of both organisms was established with a maximum-likelihood tree based on twenty six housekeeping proteins. Analysis of all probiotic strains of Bacillus and Lactobacillus reveal that the essential sporulation proteins are conserved in all Bacillus probiotic strains while they are absent in Lactobacillus spp. We identified various antibiotic resistance, stress-related, and adhesion-related domains in these organisms, which likely provide support in exerting probiotic action by enabling adhesion to host epithelial cells and survival during antibiotic treatment and harsh conditions.


July 7, 2019

Multiplex enhancer-reporter assays uncover unsophisticated TP53 enhancer logic.

Transcription factors regulate their target genes by binding to regulatory regions in the genome. Although the binding preferences of TP53 are known, it remains unclear what distinguishes functional enhancers from nonfunctional binding. In addition, the genome is scattered with recognition sequences that remain unoccupied. Using two complementary techniques of multiplex enhancer-reporter assays, we discovered that functional enhancers could be discriminated from nonfunctional binding events by the occurrence of a single TP53 canonical motif. By combining machine learning with a meta-analysis of TP53 ChIP-seq data sets, we identified a core set of more than 1000 responsive enhancers in the human genome. This TP53 cistrome is invariably used between cell types and experimental conditions, whereas differences among experiments can be attributed to indirect nonfunctional binding events. Our data suggest that TP53 enhancers represent a class of unsophisticated cell-autonomous enhancers containing a single TP53 binding site, distinct from complex developmental enhancers that integrate signals from multiple transcription factors. © 2016 Verfaillie et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Atypical Salmonella enterica serovars in murine and human infection models: Is it time to reassess our approach to the study of salmonellosis?

Nontyphoidal Salmonella species are globally disseminated pathogens and the predominant cause of gastroenteritis. The pathogenesis of salmonellosis has been extensively studied using in vivo murine models and cell lines typically challenged with Salmonella Typhimurium. Although serovars Enteritidis and Typhimurium are responsible for the most of human infections reported to the CDC, several other serovars also contribute to clinical cases of salmonellosis. Despite their epidemiological importance, little is known about their infection phenotypes. Here, we report the virulence characteristics and genomes of 10 atypical S. enterica serovars linked to multistate foodborne outbreaks in the United States. We show that the murine RAW 264.7 macrophage model of infection is unsuitable for inferring human relevant differences in nontyphoidal Salmonella infections whereas differentiated human THP-1 macrophages allowed these isolates to be further characterised in a more relevant, human context.


July 7, 2019

Characterization of the first cultured representative of Verrucomicrobia subdivision 5 indicates the proposal of a novel phylum.

The recently isolated strain L21-Fru-AB(T) represents moderately halophilic, obligately anaerobic and saccharolytic bacteria that thrive in the suboxic transition zones of hypersaline microbial mats. Phylogenetic analyses based on 16S rRNA genes, RpoB proteins and gene content indicated that strain L21-Fru-AB(T) represents a novel species and genus affiliated with a distinct phylum-level lineage originally designated Verrucomicrobia subdivision 5. A survey of environmental 16S rRNA gene sequences revealed that members of this newly recognized phylum are wide-spread and ecologically important in various anoxic environments ranging from hypersaline sediments to wastewater and the intestine of animals. Characteristic phenotypic traits of the novel strain included the formation of extracellular polymeric substances, a Gram-negative cell wall containing peptidoglycan and the absence of odd-numbered cellular fatty acids. Unusual metabolic features deduced from analysis of the genome sequence were the production of sucrose as osmoprotectant, an atypical glycolytic pathway lacking pyruvate kinase and the synthesis of isoprenoids via mevalonate. On the basis of the analyses of phenotypic, genomic and environmental data, it is proposed that strain L21-Fru-AB(T) and related bacteria are specifically adapted to the utilization of sulfated glycopolymers produced in microbial mats or biofilms.


July 7, 2019

Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture.The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90.Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.


July 7, 2019

A roadmap for gene system development in Clostridium.

Clostridium species are both heroes and villains. Some cause serious human and animal diseases, those present in the gut microbiota generally contribute to health and wellbeing, while others represent useful industrial chassis for the production of chemicals and fuels. To understand, counter or exploit, there is a fundamental requirement for effective systems that may be used for directed or random genome modifications. We have formulated a simple roadmap whereby the necessary gene systems maybe developed and deployed. At its heart is the use of ‘pseudo-suicide’ vectors and the creation of a pyrE mutant (a uracil auxotroph), initially aided by ClosTron technology, but ultimately made using a special form of allelic exchange termed ACE (Allele-Coupled Exchange). All mutants, regardless of the mutagen employed, are made in this host. This is because through the use of ACE vectors, mutants can be rapidly complemented concomitant with correction of the pyrE allele and restoration of uracil prototrophy. This avoids the phenotypic effects frequently observed with high copy number plasmids and dispenses with the need to add antibiotic to ensure plasmid retention. Once available, the pyrE host may be used to stably insert all manner of application specific modules. Examples include, a sigma factor to allow deployment of a mariner transposon, hydrolases involved in biomass deconstruction and therapeutic genes in cancer delivery vehicles. To date, provided DNA transfer is obtained, we have not encountered any clostridial species where this technology cannot be applied. These include, Clostridium difficile, Clostridium acetobutylicum, Clostridium beijerinckii, Clostridium botulinum, Clostridium perfringens, Clostridium sporogenes, Clostridium pasteurianum, Clostridium ljungdahlii, Clostridium autoethanogenum and even Geobacillus thermoglucosidasius. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.


July 7, 2019

Reply to Bemm et al. and Arakawa: Identifying foreign genes in independent Hypsibius dujardini genome assemblies.

Our report (1) describing the discovery of extensive horizontal gene transfer in a tardigrade genome has raised questions from other groups who were sequencing the Hypsibius dujardini genome in parallel or who have done new experiments and analyses since our report (2??–5). Bemm et al. (2) now report filtering our data for likely contaminants, resulting in a new, prefiltered genome assembly. Arakawa (3) has sequenced genomes of starved, washed, individual animals that had been treated with antibiotics for 48 h, and used this genomic sequence and RNA-Seq data to identify likely bona fide tardigrade contigs. Two other reports have contributed data and analysis: Delmont and Eren (4) used a newly published analysis and visualization platform, Anvi’o (6), to identify likely contaminants in our genome assembly, and Koutsovoulos et al. (5) applied useful taxon-annotated GC coverage plots (Blobplots) (7) to our data and reported an independent genome assembly.


July 7, 2019

Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.


July 7, 2019

Co-utilization of glucose and xylose by evolved Thermus thermophilus LC113 strain elucidated by (13)C metabolic flux analysis and whole genome sequencing.

We evolved Thermus thermophilus to efficiently co-utilize glucose and xylose, the two most abundant sugars in lignocellulosic biomass, at high temperatures without carbon catabolite repression. To generate the strain, T. thermophilus HB8 was first evolved on glucose to improve its growth characteristics, followed by evolution on xylose. The resulting strain, T. thermophilus LC113, was characterized in growth studies, by whole genome sequencing, and (13)C-metabolic flux analysis ((13)C-MFA) with [1,6-(13)C]glucose, [5-(13)C]xylose, and [1,6-(13)C]glucose+[5-(13)C]xylose as isotopic tracers. Compared to the starting strain, the evolved strain had an increased growth rate (~2-fold), increased biomass yield, increased tolerance to high temperatures up to 90°C, and gained the ability to grow on xylose in minimal medium. At the optimal growth temperature of 81°C, the maximum growth rate on glucose and xylose was 0.44 and 0.46h(-1), respectively. In medium containing glucose and xylose the strain efficiently co-utilized the two sugars. (13)C-MFA results provided insights into the metabolism of T. thermophilus LC113 that allows efficient co-utilization of glucose and xylose. Specifically, (13)C-MFA revealed that metabolic fluxes in the upper part of metabolism adjust flexibly to sugar availability, while fluxes in the lower part of metabolism remain relatively constant. Whole genome sequence analysis revealed two large structural changes that can help explain the physiology of the evolved strain: a duplication of a chromosome region that contains many sugar transporters, and a 5x multiplication of a region on the pVV8 plasmid that contains xylose isomerase and xylulokinase genes, the first two enzymes of xylose catabolism. Taken together, (13)C-MFA and genome sequence analysis provided complementary insights into the physiology of the evolved strain. Copyright © 2016 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.


July 7, 2019

Complete genome sequence of Vibrio alginolyticus ATCC 33787(T) isolated from seawater with three native megaplasmids.

Vibrio alginolyticus, an opportunistic pathogen, is commonly associated with vibriosis in fish and shellfish and can also cause superficial and ear infections in humans. V. alginolyticus ATCC 33787(T) was originally isolated from seawater and has been used as one of the type strains for exploring the virulence factors of marine bacteria and for developing vaccine against vibriosis. Here we sequenced and assembled the whole genome of this strain, and identified three megaplasmids and three Type VI secretion systems, thus providing useful information for the study of virulence factors and for the development of vaccine for Vibrio. Copyright © 2016. Published by Elsevier B.V.


July 7, 2019

Complete genome sequence of the crude oil-degrading thermophilic bacterium Geobacillus sp. JS12.

Here, we report the complete genome sequence of Geobacillus sp. JS12, isolated from composts located in Namhae, Korea, which shows extracellular lipolytic activities at high temperatures. An array of genes related to the utilization of lipids was identified by whole genome analysis. The genome sequence of the strain JS12 provides basic information for wider exploitation of thermostable industrial lipases. Copyright © 2016 Elsevier B.V. All rights reserved.


July 7, 2019

Complete genome sequence of the Streptomyces sp. strain CdTB01, a bacterium tolerant to cadmium.

Streptomyces sp. Strain CdTB01, which is tolerant to high concentrations of heavy metals, particularly cadmium, was isolated from soil contaminated with heavy metals. Two contigs with total genome size of 10.19Mb were identified in the whole genome sequencing and assembly, and numerous homologous genes known to be involved in heavy metal resistance were found in the genome. Copyright © 2016 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.