Bioinformatics Archives - Page 261 of 267

July 7, 2019

Genomic sequencing of Bordetella pertussis for epidemiology and global surveillance of whooping cough.

Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.

July 7, 2019

sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing.

The genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species, that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer; last accessed September 6, 2018).

July 7, 2019

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.

July 7, 2019

Complete genome sequence of the dissimilatory azo reducing thermophilic bacterium Novibacillus thermophiles SG-1.

With the isolation and identification of efficient azo-dye degradation bacteria, bioaugmentation with specific microbial strains has now become an effective strategy to promote the bioremediation of azo dye. However, Azo dye wastewater discharged at high temperature restricted the extensive application of the known mesophilic azoreducing microorganisms. Here we present the complete genome sequence of a bacterium capable of reducing azo dye under thermophilic condition, Novibacillus thermophiles SG-1 (=KCTC 33118T =CGMCC 1.12363T). The complete genome of strain SG-1 contains a circular chromosome of 3,629,225 bp with a G?+?C content of 50.44%. Genome analysis revealed that strain SG-1 possessed genes encoding riboflavin biosynthesis protein that would secrete riboflavin, which could act as electron shuttles to transport the electrons to extracellular azo dye in decolorization process. HPLC analysis showed that the concentration of riboflavin increased from 0.01?µM to 0.255?µM with the growth of strain SG-1 under azo dye reduction. Quantitative real-time PCR analysis further demonstrated that the gene encoding riboflavin biosynthesis protein would be involved in the azo dye decolorization. The results from this study would be beneficial to research the mechanism of anaerobic reduction of azo dye under thermophilic conditions. Copyright © 2018 Elsevier B.V. All rights reserved.

July 7, 2019

Speeding up DNA sequence alignment by optical correlator

In electronic computers, extensive amount of computations required for searching biological sequences in big databases leads to vast amount of energy consumption for electrical processing and cooling. On the other hand, optical processing is much faster than electrical counterpart, due to its parallel processing capability, at a fraction of energy consumption level and cost. In this regard, this paper proposes a correlation-based optical algorithm using metamaterial, taking advantages of optical parallel processing, to efficiently locate the edits as a means of DNA sequence comparison. Specifically, the proposed algorithm partitions the read DNA sequence into multiple overlapping intervals, referred to as windows, and then, extracts the peaks resulted from their cross-correlation with the reference sequence in parallel. Finally, to locate the edits, a simple algorithm utilizing number and location of the peaks is introduced to analyze the correlation outputs obtained from window-based DNA sequence comparison. As a novel implementation approach, we adopt multiple metamaterial-based optical correlators to optically implement the proposed parallel architecture, named as Window-based Optical Correlator (WOC). This wave-based computing architecture fully controls wave transmission and phase using dielectric and plasmonic materials. Design limitations and challenges of the proposed architecture are also discussed in details. The simulation results, comparing WOC with the well-known BLAST algorithm, demonstrate superior speed-up up to 60%, as well as, high accuracy even at the presence of large number of edits. Also, WOC method considerably reduces power consumption as a result of implementing metamaterial-based optical computing structure.

July 7, 2019

Genomic characterization of methylotrophy of Oharaeibacter diazotrophicus strain SM30T.

Oharaeibacter diazotrophicus strain SM30T, isolated from rice rhizosphere, is an aerobic, facultative lanthanide (Ln3+)-utilizing methylotroph and diazotroph that belongs to the Methylocystaceae family. In this research, the complete genome sequence of strain SM30T was determined, and its methylotrophy modules were characterized. The genome consists of one chromosome and two plasmids, comprising a total of 5,004,097 bp, and the GC content was 71.6 mol%. A total of 4497 CDSs, 67 tRNA, and 9 rRNA were encoded. Typical alpha-proteobacterial methylotrophy genes were found: pyrroloquinoline quinone (PQQ)-dependent methanol dehydrogenase (MDH) (mxaF and xoxF1-4), methylotrophy regulatory proteins (mxbDM and mxcQE), PQQ synthesis, H4F pathway, H4MPT pathway, formate oxidation, serine cycle, and ethylmalonyl-CoA pathway. SDS-PAGE and subsequent LC-MS analysis, and qPCR analysis revealed that MxaF and XoxF1 were the dominant MDH in the absence or presence of lanthanum (La3+), respectively. The growth of MDH gene-deletion mutants on alcohols and qPCR results indicated that mxaF and xoxF1 are also involved in ethanol and propanol oxidation, xoxF2 participates in methanol oxidation in the presence of La3+, while xoxF3 was associated with methanol and ethanol oxidation in the absence of La3+, implying that XoxF3 is a calcium (Ca2+)-binding XoxF. Four Ln3+ such as La3+, cerium (Ce3+), praseodymium (Pr3+), and neodymium (Nd3+) served as cofactors for XoxF1 by supporting ?mxaF growth on methanol. Some heavier lanthanides inhibited growth of SM30 on methanol. This study contributes to the understanding of the function of various XoxF-type MDHs and their roles in methylotrophs. Copyright © 2018 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

July 7, 2019

The complete genome sequence of Bacillus halotolerans ZB201702 isolated from a drought- and salt-stressed rhizosphere soil.

Bacillus halotolerans is a rhizobacterium with the potential to promote plant growth and tolerance to drought and salinity stress. Here, we present the complete genome sequence of B. halotolerans ZB201702, which consists of 4,150,000 bp in a linear chromosome, including 3074 protein-coding sequences, 30 rRNAs, and 85 tRNAs. Genome analysis revealed many putative gene clusters involved in defense mechanisms. Activity analysis of the strain under salt and simulated drought stress suggests tolerance to abiotic stresses. The complete genome information of B. halotolerans ZB201702 could provide valuable insights into rhizobacteria-mediated plant salt and drought tolerance and rhizobacteria-based solutions for abiotic stress agriculture. Copyright © 2018 Elsevier Ltd. All rights reserved.

July 7, 2019

The regenerative flatworm Macrostomum lignano, a model organism with high experimental potential.

Understanding the process of regeneration has been one of the longstanding scientific aims, from a fundamental biological perspective, as well as within the applied context of regenerative medicine. Because regeneration competence varies greatly between organisms, it is essential to investigate different experimental animals. The free-living marine flatworm Macrostomum lignano is a rising model organism for this type of research, and its power stems from a unique set of biological properties combined with amenability to experimental manipulation. The biological properties of interest include production of single-cell fertilized eggs, a transparent body, small size, short generation time, ease of culture, the presence of a pluripotent stem cell population, and a large regeneration competence. These features sparked the development of molecular tools and resources for this animal, including high-quality genome and transcriptome assemblies, gene knockdown, in situ hybridization, and transgenesis. Importantly, M. lignano is currently the only flatworm species for which transgenesis methods are established. This review summarizes biological features of M. lignano and recent technological advances towards experimentation with this animal. In addition, we discuss the experimental potential of this model organism for different research questions related to regeneration and stem cell biology.

July 7, 2019

Complete genome sequence of Bacillus licheniformis 14ADL4 exhibiting resistance to clindamycin

Clindamycin resistant Bacillus licheniformis 14ADL4 was isolated from doenjang, a Korean high-salt-fermented soybean food. Strain 14ADL4 contains a single circular 4,332,232 bp chromosome with a G + C content of 45.86%. The complete genome of strain 14ADL4 includes lmrA and lmrB homologs may confer resistance to clindamycin.

July 7, 2019

Draft genome sequence of Tuber borchii Vittad., a whitish edible truffle.

The ascomycete Tuber borchii (Pezizomycetes) is a whitish edible truffle that establishes ectomycorrhizal symbiosis with trees and shrubs. This fungus is ubiquitous in Europe and is also cultivated outside Europe. Here, we present the draft genome sequence of T. borchii strain Tbo3840 (97.18 Mb in 969 scaffolds, with 12,346 predicted protein-coding genes).

July 7, 2019

The complete genome sequence of Rhodobaca barguzinensis alga05 (DSM 19920) documents its adaptation for life in soda lakes.

Soda lakes, with their high salinity and high pH, pose a very challenging environment for life. Microorganisms living in these harsh conditions have had to adapt their physiology and gene inventory. Therefore, we analyzed the complete genome of the haloalkaliphilic photoheterotrophic bacterium Rhodobaca barguzinensis strain alga05. It consists of a 3,899,419 bp circular chromosome with 3624 predicted coding sequences. In contrast to most of Rhodobacterales, this strain lacks any extrachromosomal elements. To identify the genes responsible for adaptation to high pH, we compared the gene inventory in the alga05 genome with genomes of 17 reference strains belonging to order Rhodobacterales. We found that all haloalkaliphilic strains contain the mrpB gene coding for the B subunit of the MRP Na+/H+ antiporter, while this gene is absent in all non-alkaliphilic strains, which indicates its importance for adaptation to high pH. Further analysis showed that alga05 requires organic carbon sources for growth, but it also contains genes encoding the ethylmalonyl-CoA pathway for CO2 fixation. Remarkable is the genetic potential to utilize organophosphorus compounds as a source of phosphorus. In summary, its genetic inventory indicates a large flexibility of the alga05 metabolism, which is advantageous in rapidly changing environmental conditions in soda lakes.

July 7, 2019

Complete genome sequence of Clostridium kluyveri JZZ applied in Chinese strong-flavor liquor production.

Chinese strong-flavor liquor (CSFL), accounting for more than 70% of both Chinese liquor production and sales, was produced by complex fermentation with pit mud. Clostridium kluyveri, an important species coexisted with other microorganisms in fermentation pit mud (FPM), could produce caproic acid, which was subsequently converted to the key CSFL flavor substance ethyl caproate. In this study, we present the first complete genome sequence of C. kluyveri isolated from FPM. Clostridium kluyveri JZZ contains one circular chromosome and one circular plasmid with length of 4,454,353 and 58,581 bp, respectively. 4158 protein-coding genes were predicted and 2792 genes could be assigned with COG categories. It possesses the pathway predicted for biosynthesis of caproic acid with ethanol. Compared to other two C. kluyveri genomes, JZZ consists of longer chromosome with multiple gene rearrangements, and contains more genes involved in defense mechanisms, as well as DNA replication, recombination, and repair. Meanwhile, JZZ contains fewer genes involved in secondary metabolites biosynthesis, transport, and catabolism, including genes encoding Polyketide Synthases/Non-ribosomal Peptide Synthetases. Additionally, JZZ possesses 960 unique genes with relatively aggregating in defense mechanisms and transcription. Our study will be available for further research about C. kluyveri isolated from FPM, and will also facilitate the genetic engineering to increase biofuel production and improve fragrance flavor of CSFL.

July 7, 2019

A universal SNP and small-indel variant caller using deep neural networks.

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.

July 7, 2019

Spalter: A meta machine learning approach to distinguish true DNA variants from sequencing artefacts

Being able to distinguish between true DNA variants and technical sequencing artefacts is a fundamental task in whole genome, exome or targeted gene analysis. Variant calling tools provide diagnostic parameters, such as strand bias or an aggregated overall quality for each called variant, to help users make an informed choice about which variants to accept or discard. Having several such quality indicators poses a problem for the users of variant callers because they need to set or adjust thresholds for each such indicator. Alternatively, machine learning methods can be used to train a classifier based on these indicators. This approach needs large sets of labeled training data, which is not easily available. The new approach presented here relies on the idea that a true DNA variant exists independently of technical features of the read in which it appears (e.g. base quality, strand, position in the read). Therefore the nucleotide separability classification problem – predicting the nucleotide state of each read in a given pileup based on technical features only – should be near impossible to solve for true variants. Nucleotide separability, i.e. achievable classification accuracy, can either be used to distinguish between true variants and technical artefacts directly, using a thresholding approach, or it can be used as a meta-feature to train a separability-based classifier. This article explores both possibilities with promising results, showing accuracies around 90%.

July 7, 2019

Complete genome sequence of the halophile bacterium Kushneria konosiri X49T, isolated from salt-fermented Konosirus punctatus

Kushneria konosiri X49T is a member of the Halomonadaceae family within the order Oceanospirillales and can be isolated from salt-fermented larval gizzard shad. The genome of K. konosiri X49T reported here provides a genetic basis for its halophilic character. Diverse genes were involved in salt-in and -out strategies enabling adaptation of X49T to hypersaline environments. Due to resistance to high salt concentrations, genome research of K. konosiri X49T will contribute to the improvement of environmental and biotechnological usage by enhancing understanding of the osmotic equilibrium in the cytoplasm. Its genome consists of 3,584,631 bp, with an average Gthinspace+thinspaceC content of 59.1%, and 3261 coding sequences, 12 rRNAs, 66 tRNAs, and 8 miscRNAs.

Auto Tag: Bioinformatics

Genomic sequencing of Bordetella pertussis for epidemiology and global surveillance of whooping cough.

sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing.

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Complete genome sequence of the dissimilatory azo reducing thermophilic bacterium Novibacillus thermophiles SG-1.

Speeding up DNA sequence alignment by optical correlator

Genomic characterization of methylotrophy of Oharaeibacter diazotrophicus strain SM30T.

The complete genome sequence of Bacillus halotolerans ZB201702 isolated from a drought- and salt-stressed rhizosphere soil.

The regenerative flatworm Macrostomum lignano, a model organism with high experimental potential.

Complete genome sequence of Bacillus licheniformis 14ADL4 exhibiting resistance to clindamycin

Draft genome sequence of Tuber borchii Vittad., a whitish edible truffle.

The complete genome sequence of Rhodobaca barguzinensis alga05 (DSM 19920) documents its adaptation for life in soda lakes.

Complete genome sequence of Clostridium kluyveri JZZ applied in Chinese strong-flavor liquor production.

A universal SNP and small-indel variant caller using deep neural networks.

Spalter: A meta machine learning approach to distinguish true DNA variants from sequencing artefacts

Complete genome sequence of the halophile bacterium Kushneria konosiri X49T, isolated from salt-fermented Konosirus punctatus

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert