Bioinformatics Archives - Page 258 of 267

July 7, 2019

Complete genome sequence of Celeribacter baekdonensis strain LH4, a thiosulfate-oxidizing alphaproteobacterial isolate from Gulf of Mexico continental slope sediments.

We report here the closed genome sequences of Celeribacter baekdonensis strain LH4 and five unnamed plasmids obtained through PacBio sequencing with 99.99% consensus concordance. The genomes contained several distinctive features not found in other published Celeribacter genomes, including the potential to aerobically degrade styrene and other phenolic compounds. Copyright © 2018 Flood et al.

July 7, 2019

RIFRAF: a frame-resolving consensus algorithm.

Protein coding genes can be studied using long-read next generation sequencing. However, high rates of indel sequencing errors are problematic, corrupting the reading frame. Even the consensus of multiple independent sequence reads retains indel errors. To solve this problem, we introduce Reference-Informed Frame-Resolving multiple-Alignment Free template inference algorithm (RIFRAF), a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF uses a novel structure, analogous to a two-layer hidden Markov model: the consensus is optimized to maximize alignment scores with both the set of noisy reads and with a reference. The template-to-reads component of the model encodes the preponderance of indels, and is sensitive to the per-base quality scores, giving greater weight to more accurate bases. The reference-to-template component of the model penalizes frame-destroying indels. A local search algorithm proceeds in stages to find the best consensus sequence for both objectives.Using Pacific Biosciences SMRT sequences from an HIV-1 env clone, NL4-3, we compare our approach to other consensus and frame correction methods. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. It was able to perfectly reconstruct over 80% of consensus sequences from as few as three reads, whereas the best alternative required twice as many. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones.RIFRAF is implemented in Julia, and source code is publicly available at https://github.com/MurrellGroup/Rifraf.jl.Supplementary data are available at Bioinformatics online.

July 7, 2019

Isolation and identification of an anthracimycin analogue from Nocardiopsis kunsanensis, a halophile from a saltern, by genomic mining strategy.

Modern medicine is unthinkable without antibiotics; yet, growing issues with microbial drug resistance require intensified search for new active compounds. Natural products generated by Actinobacteria have been a rich source of candidate antibiotics, for example anthracimycin that, so far, is only known to be produced by Streptomyces species. Based on sequence similarity with the respective biosynthetic cluster, we sifted through available microbial genome data with the goal to find alternative anthracimycin-producing organisms. In this work, we report about the prediction and experimental verification of the production of anthracimycin derivatives by Nocardiopsis kunsanensis, a non-Streptomyces actinobacterial microorganism. We discovered N. kunsanensis to predominantly produce a new anthracimycin derivative with methyl group at C-8 and none at C-2, labeled anthracimycin BII-2619, besides a minor amount of anthracimycin. It displays activity against Gram-positive bacteria with similar low level of mammalian cytotoxicity as that of anthracimycin.

July 7, 2019

Complete genome sequencing of the mouse intestinal isolate Escherichia coli Mt1B1.

Escherichia coli Mt1B1, a mouse isolate, is a facultative anaerobic bacterium which was shown to counteract Salmonella enterica serovar Typhimurium infection in a mouse model. In the present study, we describe the complete genome sequence of E. coli Mt1B1, composed of a 5.1-Mb chromosome and a 62.6-kb plasmid. Copyright © 2018 Garzetti et al.

July 7, 2019

The draft genome of the lichen-forming fungus Lasallia hispanica (Frey) Sancho & A. Crespo

Lasallia hispanica (Frey) Sancho & A. Crespo is one of three Lasallia species occurring in central-western Europe. It is an orophytic, photophilous Mediterranean endemic which is sympatric with the closely related, widely distributed, highly clonal sister taxon L. pustulata in the supra- and oro-Mediterranean belts. We sequenced the genome of L. hispanica from a multispore isolate. The total genome length is 41·2 Mb, including 8488 gene models. We present the annotation of a variety of genes that are involved in protein secretion, mating processes and secondary metabolism, and we report transposable elements. Additionally, we compared the genome of L. hispanica to the closely related, yet ecologically distant, L. pustulata and found high synteny in gene content and order. The newly assembled and annotated L. hispanica genome represents a useful resource for future investigations into niche differentiation, speciation and microevolution in L. hispanica and other members of the genus.

July 7, 2019

Complete genome sequence of Mycobacterium shigaense.

Mycobacterium shigaense is a slowly growing scotochromogenic species and a member of the Mycobacterium simiae complex group. Here, we report the complete sequence of its genome, comprising a 5.2-Mb chromosome. The sequence will represent the essential data for future phylogenetic and comparative genome studies of the Mycobacterium simiae complex group. Copyright © 2018 Yoshida et al.

July 7, 2019

Genome sequence of Bacillus megaterium strain YC4-R4, a plant growth- promoting rhizobacterium isolated from a high-salinity environment.

Here, we report the complete genome sequence for Bacillus megaterium strain YC4-R4, a highly salt-tolerant rhizobacterium that promotes growth in plants. The sequencing process was performed by combining pyrosequencing and single-molecule sequencing techniques. The complete genome is estimated to be approximately 5.44 Mb, containing a total of 5,673 predicted protein-coding DNA sequences (CDSs). Copyright © 2018 Vílchez et al.

July 7, 2019

Single circular chromosome identified from the genome sequence of the Vibrio cholerae O1 bv. El Tor Ogawa strain V060002.

We report here the complete genome sequence of the Vibrio cholerae O1 bv. El Tor Ogawa strain V060002, isolated in 1997. The data demonstrate that this clinical strain has a single chromosome resulting from recombination of two prototypical chromosomes. Copyright © 2018 Yamamoto et al.

July 7, 2019

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.

The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50?=?4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50?=?14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~?10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n?=?13).ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.

July 7, 2019

The challenge of analyzing the sugarcane genome.

Reference genome sequences have become key platforms for genetics and breeding of the major crop species. Sugarcane is probably the largest crop produced in the world (in weight of crop harvested) but lacks a reference genome sequence. Sugarcane has one of the most complex genomes in crop plants due to the extreme level of polyploidy. The genome of modern sugarcane hybrids includes sub-genomes from two progenitors Saccharum officinarum and S. spontaneum with some chromosomes resulting from recombination between these sub-genomes. Advancing DNA sequencing technologies and strategies for genome assembly are making the sugarcane genome more tractable. Advances in long read sequencing have allowed the generation of a more complete set of sugarcane gene transcripts. This is supporting transcript profiling in genetic research. The progenitor genomes are being sequenced. A monoploid coverage of the hybrid genome has been obtained by sequencing BAC clones that cover the gene space of the closely related sorghum genome. The complete polyploid genome is now being sequenced and assembled. The emerging genome will allow comparison of related genomes and increase understanding of the functioning of this polyploidy system. Sugarcane breeding for traditional sugar and new energy and biomaterial uses will be enhanced by the availability of these genomic resources.

July 7, 2019

Analysis of resistance genes of clinical Pannonibacter phragmitetus strain 31801 by complete genome sequencing.

To clarify the resistance mechanisms of Pannonibacter phragmitetus 31801, isolated from the blood of a liver abscess patient, at the genomic level, we performed whole genomic sequencing using a PacBio RS II single-molecule real-time long-read sequencer. Bioinformatic analysis of the resulting sequence was then carried out to identify any possible resistance genes. Analyses included Basic Local Alignment Search Tool searches against the Antibiotic Resistance Genes Database, ResFinder analysis of the genome sequence, and Resistance Gene Identifier analysis within the Comprehensive Antibiotic Resistance Database. Prophages, clustered regularly interspaced short palindromic repeats (CRISPR), and other putative virulence factors were also identified using PHAST, CRISPRfinder, and the Virulence Factors Database, respectively. The circular chromosome and single plasmid of P. phragmitetus 31801 contained multiple antibiotic resistance genes, including those coding for three different types of ß-lactamase [NPS ß-lactamase (EC 3.5.2.6), ß-lactamase class C, and a metal-dependent hydrolase of ß-lactamase superfamily I]. In addition, genes coding for subunits of several multidrug-resistance efflux pumps were identified, including those targeting macrolides (adeJ, cmeB), tetracycline (acrB, adeAB), fluoroquinolones (acrF, ceoB), and aminoglycosides (acrD, amrB, ceoB, mexY, smeB). However, apart from the tripartite macrolide efflux pump macAB-tolC, the genome did not appear to contain the complete complement of subunit genes required for production of most of the major multidrug-resistance efflux pumps.

July 7, 2019

Assembly of a complete genome sequence for Gemmata obscuriglobus reveals a novel prokaryotic rRNA operon gene architecture.

Gemmata obscuriglobus is a Gram-negative bacterium with several intriguing biological features. Here, we present a complete, de novo whole genome assembly for G. obscuriglobus which consists of a single, circular 9 Mb chromosome, with no plasmids detected. The genome was annotated using the NCBI Prokaryotic Genome Annotation pipeline to generate common gene annotations. Analysis of the rRNA genes revealed three interesting features for a bacterium. First, linked G. obscuriglobus rrn operons have a unique gene order, 23S-5S-16S, compared to typical prokaryotic rrn operons (16S-23S-5S). Second, G. obscuriglobus rrn operons can either be linked or unlinked (a 16S gene is in a separate genomic location from a 23S and 5S gene pair). Third, all of the 23S genes (5 in total) have unique polymorphisms. Genome analysis of a different Gemmata species (SH-PL17), revealed a similar 23S-5S-16S gene order in all of its linked rrn operons and the presence of an unlinked operon. Together, our findings show that unique and rare features in Gemmata rrn operons among prokaryotes provide a means to better define the evolutionary relatedness of Gemmata species and the divergence time for different Gemmata species. Additionally, these rrn operon differences provide important insights into the rrn operon architecture of common ancestors of the planctomycetes.

July 7, 2019

Activation of the mismatch-specific endonuclease EndoMS/NucS by the replication clamp is required for high fidelity DNA replication.

The mismatch repair (MMR) system, exemplified by the MutS/MutL proteins, is widespread in Bacteria and Eukarya. However, molecular mechanisms how numerous archaea and bacteria lacking the mutS/mutL genes maintain high replication fidelity and genome stability have remained elusive. EndoMS is a recently discovered hyperthermophilic mismatch-specific endonuclease encoded by nucS in Thermococcales. We deleted the nucS from the actinobacterium Corynebacterium glutamicum and demonstrated a drastic increase of spontaneous transition mutations in the nucS deletion strain. The observed spectra of these mutations were consistent with the enzymatic properties of EndoMS in vitro. The robust mismatch-specific endonuclease activity was detected with the purified C. glutamicum EndoMS protein but only in the presence of the ß-clamp (DnaN). Our biochemical and genetic data suggest that the frequently occurring G/T mismatch is efficiently repaired by the bacterial EndoMS-ß-clamp complex formed via a carboxy-terminal sequence motif of EndoMS proteins. Our study thus has great implications for understanding how the activity of the novel MMR system is coordinated with the replisome and provides new mechanistic insight into genetic diversity and mutational patterns in industrially and clinically (e.g. Mycobacteria) important archaeal and bacterial phyla previously thought to be devoid of the MMR system.

July 7, 2019

First complete genome sequence of Yersinia massiliensis.

Using a combination of Illumina paired-end sequencing, Pacific Biosciences RS II sequencing, and OpGen Argus whole-genome optical mapping, we report here the first complete genome sequence of Yersinia massiliensis The completed genome consists of a 4.99-Mb chromosome, a 121-kb megaplasmid, and a 57-kb plasmid.© Crown copyright 2018.

July 7, 2019

Draft genome sequence of the fish pathogen Flavobacterium columnare strain MS-FC-4.

Flavobacterium columnare MS-FC-4 is a highly virulent genetic group 1 (formerly genomovar I) strain isolated from rainbow trout (Oncorhynchus mykiss). The draft genome consists of three contigs totaling 3,449,277 bp with 2,811 predicted open reading frames. F. columnare MS-FC-4 is a model strain for functional genomic analyses.

Auto Tag: Bioinformatics

Complete genome sequence of Celeribacter baekdonensis strain LH4, a thiosulfate-oxidizing alphaproteobacterial isolate from Gulf of Mexico continental slope sediments.

RIFRAF: a frame-resolving consensus algorithm.

Isolation and identification of an anthracimycin analogue from Nocardiopsis kunsanensis, a halophile from a saltern, by genomic mining strategy.

Complete genome sequencing of the mouse intestinal isolate Escherichia coli Mt1B1.

The draft genome of the lichen-forming fungus Lasallia hispanica (Frey) Sancho & A. Crespo

Complete genome sequence of Mycobacterium shigaense.

Genome sequence of Bacillus megaterium strain YC4-R4, a plant growth- promoting rhizobacterium isolated from a high-salinity environment.

Single circular chromosome identified from the genome sequence of the Vibrio cholerae O1 bv. El Tor Ogawa strain V060002.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.

The challenge of analyzing the sugarcane genome.

Analysis of resistance genes of clinical Pannonibacter phragmitetus strain 31801 by complete genome sequencing.

Assembly of a complete genome sequence for Gemmata obscuriglobus reveals a novel prokaryotic rRNA operon gene architecture.

Activation of the mismatch-specific endonuclease EndoMS/NucS by the replication clamp is required for high fidelity DNA replication.

First complete genome sequence of Yersinia massiliensis.

Draft genome sequence of the fish pathogen Flavobacterium columnare strain MS-FC-4.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert