Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly, however they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost.We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-3 predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies.https://gitlab.inria.fr/pmarijon/knot.Supplementary data are available at Bioinformatics online. © The Author(s) (2019). Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Complete Whole-Genome Sequences of Two Raoultella terrigena Strains, NCTC 13097 and NCTC 13098, Isolated from Human Cases.
Raoultella terrigena is a bacterial species associated with soil and aquatic environments; however, sporadic cases of opportunistic disease in humans have been reported. Here, we report the first two complete genome sequences from clinical strains isolated from human sources that have been deposited in the National Collection of Type Cultures (NCTC). © Crown copyright 2019.
Haemophilus haemolyticus is a Gram-negative bacterium that is a commensal of the respiratory tract in humans. Here, we report the complete genome sequence available for Haemophilus haemolyticus strain NCTC 10839, which was originally isolated from the nasopharynx of a child. © Crown copyright 2019.
Genomic Islands in the Full-Genome Sequence of an NAD-Hemin-Independent Avibacterium paragallinarum Strain Isolated from Peru.
Here, we report the full-genome sequence of an NAD-hemin-independent Avibacterium paragallinarum serovar C-2 strain, FARPER-174, isolated from layer hens in Peru. This genome contained 12 potential genomic islands that include ribosomal protein-coding genes, a nadR gene, hemocin-coding genes, sequences of fagos, an rtx operon, and drug resistance genes. Copyright © 2019 Tataje-Lavanda et al.
Genome mining identifies cepacin as a plant-protective metabolite of the biopesticidal bacterium Burkholderia ambifaria.
Beneficial microorganisms are widely used in agriculture for control of plant pathogens, but a lack of efficacy and safety information has limited the exploitation of multiple promising biopesticides. We applied phylogeny-led genome mining, metabolite analyses and biological control assays to define the efficacy of Burkholderia ambifaria, a naturally beneficial bacterium with proven biocontrol properties but potential pathogenic risk. A panel of 64 B.?ambifaria strains demonstrated significant antimicrobial activity against priority plant pathogens. Genome sequencing, specialized metabolite biosynthetic gene cluster mining and metabolite analysis revealed an armoury of known and unknown pathways within B.?ambifaria. The biosynthetic gene cluster responsible for the production of the metabolite cepacin was identified and directly shown to mediate protection of germinating crops against Pythium damping-off disease. B.?ambifaria maintained biopesticidal protection and overall fitness in the soil after deletion of its third replicon, a non-essential plasmid associated with virulence in Burkholderia?cepacia complex bacteria. Removal of the third replicon reduced B.?ambifaria persistence in a murine respiratory infection model. Here, we show that by using interdisciplinary phylogenomic, metabolomic and functional approaches, the mode of action of natural biological control agents related to pathogens can be systematically established to facilitate their future exploitation.
Biphasic cellular adaptations and ecological implications of Alteromonas macleodii degrading a mixture of algal polysaccharides.
Algal polysaccharides are an important bacterial nutrient source and central component of marine food webs. However, cellular and ecological aspects concerning the bacterial degradation of polysaccharide mixtures, as presumably abundant in natural habitats, are poorly understood. Here, we contextualize marine polysaccharide mixtures and their bacterial utilization in several ways using the model bacterium Alteromonas macleodii 83-1, which can degrade multiple algal polysaccharides and contributes to polysaccharide degradation in the oceans. Transcriptomic, proteomic and exometabolomic profiling revealed cellular adaptations of A. macleodii 83-1 when degrading a mix of laminarin, alginate and pectin. Strain 83-1 exhibited substrate prioritization driven by catabolite repression, with initial laminarin utilization followed by simultaneous alginate/pectin utilization. This biphasic phenotype coincided with pronounced shifts in gene expression, protein abundance and metabolite secretion, mainly involving CAZymes/polysaccharide utilization loci but also other functional traits. Distinct temporal changes in exometabolome composition, including the alginate/pectin-specific secretion of pyrroloquinoline quinone, suggest that substrate-dependent adaptations influence chemical interactions within the community. The ecological relevance of cellular adaptations was underlined by molecular evidence that common marine macroalgae, in particular Saccharina and Fucus, release mixtures of alginate and pectin-like rhamnogalacturonan. Moreover, CAZyme microdiversity and the genomic predisposition towards polysaccharide mixtures among Alteromonas spp. suggest polysaccharide-related traits as an ecophysiological factor, potentially relating to distinct ‘carbohydrate utilization types’ with different ecological strategies. Considering the substantial primary productivity of algae on global scales, these insights contribute to the understanding of bacteria-algae interactions and the remineralization of chemically diverse polysaccharide pools, a key step in marine carbon cycling.
Streptococcus periodonticum sp. nov., Isolated from Human Subgingival Dental Plaque of Periodontitis Lesion.
A novel facultative anaerobic and Gram-stain-positive coccus, designated strain ChDC F135T, was isolated from human subgingival dental plaque of periodontitis lesion and was characterized by polyphasic taxonomic analysis. The 16S rRNA gene (16S rDNA) sequence of strain ChDC F135T was closest to that of Streptococcus sinensis HKU4T (98.2%), followed by Streptococcus intermedia SK54T (97.0%), Streptococcus constellatus NCTC11325T (96.0%), and Streptococcus anginosus NCTC 10713T (95.7%). In contrast, phylogenetic analysis based on the superoxide dismutase gene (sodA) and the RNA polymerase beta-subunit gene (rpoB) showed that the nucleotide sequence similarities of strain ChDC F135T were highly similar to the corresponding genes of S. anginosus NCTC 10713T (99.2% and 97.6%, respectively), S. constellatus NCTC11325T (87.8% and 91.4%, respectively), and S. intermedia SK54T (85.8% and 91.2%, respectively) rather than those of S. sinensis HKU4T (80.5% and 82.6%). The complete genome of strain ChDC F135T consisted of 1,901,251 bp and the G+C content was 38.9 mol %. Average nucleotide identity value between strain ChDC F135T and S. sinensis HKU4T or S. anginosus NCTC 10713T were 75.7% and 95.6%, respectively. The C14:0 composition of the cellular fatty acids of strain ChDC F135T (32.8%) was different from that of S. intermedia (6-8%), S. constellatus (6-13%), and S. anginosus (13-20%). Based on the results of phylogenetic and phenotypic analysis, strain ChDC F135T (=?KCOM 2412T?=?JCM 33300T) was classified as a type strain of a novel species of the genus Streptococcus, for which we proposed the name Streptococcus periodonticum sp. nov.
A novel facultative anaerobic, Gram-stain-negative coccus, designated strain ChDC B345T, was isolated from human pericoronitis lesion and was characterized by polyphasic taxonomic analysis. The 16S ribosomal RNA gene (16S rDNA) sequence revealed that the strain belonged to the genus Streptococcus. The 16S rDNA sequence of strain ChDC B345T was most closely related to those of Streptococcus mitis NCTC 12261T (99.5%) and Streptococcus pseudopneumoniae ATCC BAA-960T (99.5%). Complete genome of strain ChDC B345T was 1,972,471 bp in length and the G?+?C content was 40.2 mol%. Average nucleotide identity values between strain ChDC B345T and S. pseudopneumoniae ATCC BAA-960T or S. mitis NCTC 12261T were 92.17% and 93.63%, respectively. Genome-to-genome distance values between strain ChDC B345T and S. pseudopneumoniae ATCC BAA-960T or S. mitis NCTC 12261T were 47.8% (45.2-50.4%) and 53.0% (51.0-56.4%), respectively. Based on these results, strain ChDC B345T (=?KCOM 1679T?=?JCM 33299T) should be classified as a novel species of genus Streptococcus, for which we propose the name Streptococcus gwangjuense sp. nov.
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.
Whole genome sequencing used in an industrial context reveals a Salmonella laboratory cross-contamination.
In 2013, during a routine laboratory analysis performed on food samples, one finished product from a European factory was tested positive for Salmonella Hadar. At the same period, one environmental isolate in the same laboratory was serotyped Salmonella Hadar. Prior to this event, the laboratory performed a proficiency testing involving a sample spiked with NCTC 9877 Salmonella Hadar. The concomitance of Salmonella Hadar detection led to the suspicion of a laboratory cross-contamination between the Salmonella Hadar isolate used in the laboratory proficiency testing and the Salmonella Hadar isolate found on the finished product by the same laboratory. Since the classical phenotypic serotyping method is able to attribute a serotype to Salmonella isolates with a common antigenic formula, but cannot differentiate strains of the same serotype within the subspecies, whole genome sequencing was used to test the laboratory cross-contamination hypothesis. Additionally, 12 Salmonella Hadar from public databases, available until the time of the event, were included in the whole genome sequencing analysis to better understand the genomic diversity of this serotype in Europe. The outcome of the analysis showed a maximum of ten single nucleotide polymorphisms (SNPs) between the isolates coming from the laboratory and the finished product, and thus confirmed the laboratory cross-contamination. These results combined with all additional investigations done at the factory, allowed to release finished product batches produced and thus circumvented unnecessary food waste and economic losses for the factory. Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.
Complete genome sequence and phylogenetic analysis of nosocomial pathogen Acinetobacter nosocomialis strain NCTC 8102.
Acinetobacter has emerged recently as one of the most challenging nosocomial pathogens because of its increased rate of antimicrobial resistance. The genetic complexity and genome diversity, as well as the lack of adequate knowledge on the pathogenic determinants of Acinetobacter strains often hinder with pathogenesis studies for the development of better therapeutics to tackle this nosocomial pathogen.In this study, we comparatively analyzed the whole genome sequence of a virulent Acinetobacternosocomialis strain NCTC 8102.The genomic DNA of A. nosocomialis NCTC 8102 was isolated and sequenced using PacBio RS II platform. The sequenced genome was functionally annotated and gene prediction was carried out using the program, Glimmer 3. The phylogenetic analysis of the genome was performed using Mega 6 program and the comparative genome analysis was carried out by BLAST (Basic Local Alignment Search Tool).The complete genome analysis depicted that the genome consists of a circular chromosome with an average G?+?C content of 38.7%. The genome comprises 3700 protein-coding genes, 96 RNA genes (18 rRNA, 74 tRNA and 4 ncRNA genes), and 91 pseudogenes. In addition, 6 prophage regions comprising 2 intact, 1 incomplete and 3 questionable ones and 18 genomic islands were identified in the genome, suggesting the possible occurrence of horizontal gene transfer in this strain. Comparative genome analysis of A. nosocomialis NCTC 8102 genome with the already sequenced A. nosocomialis strain SSA3 showed an average nucleotide identity of 99.0%. In addition, the number of prophages and genomic islands were higher in the A. nosocomialis NCTC 8102 genome compared to that of the strain SSA3. 14 of the genomic islands were unique to A. nosocomialis NCTC 8102 compared to strain SSA3 and they harbored genes which are involved in virulence, multidrug resistance, biofilm formation and bacterial pathogenesis.We sequenced the whole genome of A. nosocomialis strain NCTC 8102 followed by comparatively genome analysis. The study provides valuable information on the genetic features of A. nosocomialis strain and the data from this study would assist in further studies for the development of control measures for this nosocomial pathogen.
The history, genome and biology of NCTC 30: a non-pandemic Vibrio cholerae isolate from World War One.
The sixth global cholera pandemic lasted from 1899 to 1923. However, despite widespread fear of the disease and of its negative effects on troop morale, very few soldiers in the British Expeditionary Forces contracted cholera between 1914 and 1918. Here, we have revived and sequenced the genome of NCTC 30, a 102-year-old Vibrio cholerae isolate, which we believe is the oldest publicly available live V. cholerae strain in existence. NCTC 30 was isolated in 1916 from a British soldier convalescent in Egypt. We found that this strain does not encode cholera toxin, thought to be necessary to cause cholera, and is not part of V. cholerae lineages responsible for the pandemic disease. We also show that NCTC 30, which predates the introduction of penicillin-based antibiotics, harbours a functional ß-lactamase antibiotic resistance gene. Our data corroborate and provide molecular explanations for previous phenotypic studies of NCTC 30 and provide a new high-quality genome sequence for historical, non-pandemic V. cholerae.
It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer’s neighborhood.Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with “neighbor” modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.https://github.com/alibashir/EMMCKmer.
Downregulation of a predominantly hepatocyte-specific miR-122 is associated with human liver cancer metastasis, whereas miR-122-deficient mice display normal liver function. Here we show a functional conservation of miR-122 in the TGFß pathway: miR-122 target site is present in the mouse but not human TGFßR1, whereas a noncanonical target site is present in the TGFß1 5’UTR in humans and other primates. Experimental switch of the miR-122 target between the receptor TGFßR1 and the ligand TGFß1 changes the metastatic properties of mouse and human liver cancer cells. High expression of TGFß1 in human primary liver tumours is associated with poor survival. We identify over 50 other miRNAs orthogonally targeting ligand/receptor pairs in humans and mice, suggesting that these are evolutionarily common events. These results reveal an evolutionary mechanism for miRNA-mediated gene regulation underlying species-specific physiological or pathological phenotype and provide a potentially valuable strategy for treating liver-associated diseases.
In vitro characterization of phenylacetate decarboxylase, a novel enzyme catalyzing toluene biosynthesis in an anaerobic microbial community.
Anaerobic bacterial biosynthesis of toluene from phenylacetate was reported more than two decades ago, but the biochemistry underlying this novel metabolism has never been elucidated. Here we report results of in vitro characterization studies of a novel phenylacetate decarboxylase from an anaerobic, sewage-derived enrichment culture that quantitatively produces toluene from phenylacetate; complementary metagenomic and metaproteomic analyses are also presented. Among the noteworthy findings is that this enzyme is not the well-characterized clostridial p-hydroxyphenylacetate decarboxylase (CsdBC). However, the toluene synthase under study appears to be able to catalyze both phenylacetate and p-hydroxyphenylacetate decarboxylation. Observations suggesting that phenylacetate and p-hydroxyphenylacetate decarboxylation in complex cell-free extracts were catalyzed by the same enzyme include the following: (i) the specific activity for both substrates was comparable in cell-free extracts, (ii) the two activities displayed identical behavior during chromatographic separation of cell-free extracts, (iii) both activities were irreversibly inactivated upon exposure to O2, and (iv) both activities were similarly inhibited by an amide analog of p-hydroxyphenylacetate. Based upon these and other data, we hypothesize that the toluene synthase reaction involves a glycyl radical decarboxylase. This first-time study of the phenylacetate decarboxylase reaction constitutes an important step in understanding and ultimately harnessing it for making bio-based toluene.