Bioinformatics Archives - Page 217 of 267

July 7, 2019

Complete genome sequence of Spirosoma montaniterrae DY10 T isolated from gamma-ray irradiated soil

A Gram-negative, yellow-pigmented, long-rod shaped bacterium Spirosoma montaniterrae DY10T was isolated from a soil sample collected at Mt. Deogyusan, Jeonbuk Province, Republic of Korea. Cells showed extreme gamma radiation resistance with the D10 value of 12 KGy. The complete genome sequence of strain DY10T is consist of a circular chromosome (5,797,678 bp) encoding 5,116 genes, 9 rRNA genes and 39 tRNA genes. The genomic features contain the key enzymes for gamma and UVC radiation.

July 7, 2019

Glaucophyta

The Glaucophyta is by far the least species-rich phylum of the Archaeplastida comprising only four described genera, Glaucocystis, Cyanophora, Gloeochaete, and Cyanoptyche, and 15 species. However, recent molecular and morphological analyses reveal that glaucophytes are not as species poor as hitherto assumed with many novel lineages existing in natural environments. Glaucophytes are freshwater phototrophs of moderate to low abundance and retain many ancestral plastid traits derived from the cyanobacterial donor of this organelle, including the remnant peptidoglycan wall in their envelope. These plastids were originally named “cyanelles,” which was later changed to “muroplasts” when their shared ancestry with other Archaeplastida was recognized. The model glaucophyte, Cyanophora paradoxa, is well studied with respect to biochemistry, proteomics, and the gene content of the nuclear and organelle genomes. Investigation of the biosynthesis of cytosolic starch led to a model for the transition from glycogen to starch storage during plastid endosymbiosis. The photosynthetic apparatus, including phycobilisome antennae, resembles that of cyanobacteria. However, the carbon-concentrating mechanism is algal in nature and based on pyrenoids. Studies on protein import into muroplasts revealed a primordial Toc/Tic translocon. The peptidoglycan wall was elucidated with respect to composition, biosynthesis, and involvement of nuclear genes. The muroplast genome is distinct, not due to the number of encoded genes but, rather, because of the presence of unique genes not present on other plastid genomes. The mosaic nature of the gene-rich (27,000) nuclear genome came as a surprise, considering the relatively small genomes of unicellular red algae.

July 7, 2019

Tracing origins of the Salmonella Bareilly strain causing a food-borne outbreak in the United States.

Using a novel combination of whole-genome sequencing (WGS) analysis and geographic metadata, we traced the origins of Salmonella Bareilly isolates collected in 2012 during a widespread food-borne outbreak in the United States associated with scraped tuna imported from India.Using next-generation sequencing, we sequenced the complete genome of 100 Salmonella Bareilly isolates obtained from patients who consumed contaminated product, from natural sources, and from unrelated historically and geographically disparate foods. Pathogen genomes were linked to geography by projecting the phylogeny on a virtual globe and produced a transmission network.Phylogenetic analysis of WGS data revealed a common origin for outbreak strains, indicating that patients in Maryland and New York were infected from sources originating at a facility in India.These data represent the first report fully integrating WGS analysis with geographic mapping and a novel use of transmission networks. Results showed that WGS vastly improves our ability to delimit the scope and source of bacterial food-borne contamination events. Furthermore, these findings reinforce the extraordinary utility that WGS brings to global outbreak investigation as a greatly enhanced approach to protecting the human food supply chain as well as public health in general. Published by Oxford University Press for the Infectious Diseases Society of America 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

July 7, 2019

Complete chloroplast genome sequence of Fritillaria unibracteata var. wabuensis based on SMRT Sequencing Technology.

Fritillaria unibracteata var. wabuensis is an important medicinal plant used for the treatment of cough symptoms related to the respiratory system. The chloroplast genome of F. unibracteata var. wabuensis (GenBank accession no. KF769142) was assembled using the PacBio RS platform (Pacific Biosciences, Beverly, MA) as a circle sequence with 151 009?bp. The assembled genome contains 133 genes, including 88 protein-coding, 37 tRNA, and eight rRNA genes. This genome sequence will provide important resource for further studies on the evolution of Fritillaria genus and molecular identification of Fritillaria herbs and their adulterants. This work suggests that PacBio RS is a powerful tool to sequence and assemble chloroplast genomes.

July 7, 2019

Microbial bioinformatics for food safety and production.

In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput ‘omics’ technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety. © The Author 2015. Published by Oxford University Press.

July 7, 2019

Effects of genome structure variation, homeologous genes and repetitive DNA on polyploid crop research in the age of genomics.

Compared to diploid species, allopolyploid crop species possess more complex genomes, higher productivity, and greater adaptability to changing environments. Next generation sequencing techniques have produced high-density genetic maps, whole genome sequences, transcriptomes and epigenomes for important polyploid crops. However, several problems interfere with the full application of next generation sequencing techniques to these crops. Firstly, different types of genomic variation affect sequence assembly and QTL mapping. Secondly, duplicated or homoeologous genes can diverge in function and then lead to emergence of many minor QTL, which increases difficulties in fine mapping, cloning and marker assisted selection. Thirdly, repetitive DNA sequences arising in polyploid crop genomes also impact sequence assembly, and are increasingly being shown to produce small RNAs to regulate gene expression and hence phenotypic traits. We propose that these three key features should be considered together when analyzing polyploid crop genomes. It is apparent that dissection of genomic structural variation, elucidation of the function and mechanism of interaction of homoeologous genes, and investigation of the de novo roles of repeat sequences in agronomic traits are necessary for genomics-based crop breeding in polyploids. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

July 7, 2019

Estimating fitness of viral quasispecies from next-generation sequencing data.

The quasispecies model is ubiquitous in the study of viruses. While having lead to a number of insights that have stood the test of time, the quasispecies model has mostly been discussed in a theoretical fashion with little support of data. With next-generation sequencing (NGS), this situation is changing and a wealth of data can now be produced in a time- and cost-efficient manner. NGS can, after removal of technical errors, yield an exceedingly detailed picture of the viral population structure. The widespread availability of cross-sectional data can be used to study fitness landscapes of viral populations in the quasispecies model. This chapter highlights methods that estimate the strength of selection in selective sweeps, assesses marginal fitness effects of quasispecies, and finally infers the fitness landscape of a viral quasispecies, all on the basis of NGS data.

July 7, 2019

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of ‘future-generation’ sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions.By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/.bonizzoni@disco.unimib.itSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

High incidence of invasive group A Streptococcus disease caused by strains of uncommon emm types in Thunder Bay, Ontario, Canada.

An outbreak of type emm59 invasive group A Streptococcus (iGAS) disease was declared in 2008 in Thunder Bay District, Northwestern Ontario, two years after a country-wide emm59 epidemic was recognized in Canada. Despite a declining number of emm59 infections since 2010, numerous cases of iGAS disease continue to be reported in the area. We collected clinical information on all iGAS cases recorded in Thunder Bay District from 2008-2013. We also emm typed and sequenced the genomes of all available strains isolated in 2011-2013 from iGAS infections, and from severe cases of soft tissue infections. We used whole-genome data to investigate the population structure of GAS strains of the most frequently isolated emm types. We report increased incidence of iGAS in Thunder Bay compared to the metropolitan area of Toronto/Peel and the province of Ontario. Illicit drug use, alcohol abuse, homelessness and hepatitis C infection were underlying diseases or conditions that might have predisposed patients to iGAS disease. Most cases were caused by clonal strains of “skin” or “generalist” emm types (i.e. emm82, emm87, emm101, emm4, emm83, and emm114), uncommonly seen in other areas of the province. We observed rapid waxing and waning of emm types causing disease and their replacement by other emm types associated with the same tissue tropisms. Thus, iGAS disease in Thunder Bay District predominantly affects a select population of disadvantaged persons and is caused by clonally related strains of a few “skin” and “generalist” emm types less commonly associated with iGAS in other areas of Ontario. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

July 7, 2019

Identification and resolution of microdiversity through metagenomic sequencing of parallel consortia.

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

July 7, 2019

ALP & FALP: C++ libraries for pairwise local alignment E-values.

Pairwise local alignment is an indispensable tool for molecular biologists. In real time (i.e. in about 1 s), ALP (Ascending Ladder Program) calculates the E-values for protein-protein or DNA-DNA local alignments of random sequences, for arbitrary substitution score matrix, gap costs and letter abundances; and FALP (Frameshift Ascending Ladder Program) performs a similar task, although more slowly, for frameshifting DNA-protein alignments.To permit other C++ programmers to implement the computational efficiencies in ALP and FALP directly within their own programs, C++ source codes are available in the public domain at http://go.usa.gov/3GTSW under ‘ALP’ and ‘FALP’, along with the standalone programs ALP and FALP.spouge@nih.govSupplementary information: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

July 7, 2019

Protein O-linked glycosylation in the plant pathogen Ralstonia solanacearum.

Ralstonia solanacearum is one of the most lethal phytopathogens in the world. Due to its broad host range, it can cause wilting disease in many plant species of economic interest. In this work, we identified the O-oligosaccharyltransferase (O-OTase) responsible for protein O-glycosylation in R. solanacearum. An analysis of the glycoproteome revealed that 20 proteins, including type IV pilins are substrates of this general glycosylation system. Although multiple glycan forms were identified, the majority of the glycopeptides were modified with a pentasaccharide composed of HexNAc-(Pen)-dHex3, similar to the O antigen subunit present in the lipopolysaccharide of multiple R. solanacearum strains. Disruption of the O-OTase led to the total loss of protein glycosylation, together with a defect in biofilm formation and reduced pathogenicity towards tomato plants. Comparative proteomic analysis revealed that the loss of glycosylation is not associated with widespread proteome changes. Only the levels of a single glycoprotein, the type IV pilin, were diminished in the absence of glycosylation. In parallel, disruption of glycosylation triggered an increase in the levels of a surface lectin homologous to Pseudomonas PA-IIL. These results reveal the important role of glycosylation in the pathogenesis of R. solanacearum. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.

July 7, 2019

Complete genome sequence of Pandoraea thiooxydans DSM 25325(T), a thiosulfate-oxidizing bacterium.

Pandoraea thiooxydans DSM 25325(T) is a thiosulfate-oxidizing bacterium isolated from rhizosphere soils of a sesame plant. Here, we present the first complete genome of P. thiooxydans DSM 25325(T). Several genes involved in thiosulfate oxidation and biodegradation of aromatic compounds were identified. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Genome mining of astaxanthin biosynthetic genes from Sphingomonas sp. ATCC 55669 for heterologous overproduction in Escherichia coli.

As a highly valued keto-carotenoid, astaxanthin is widely used in nutritional supplements and pharmaceuticals. Therefore, the demand for biosynthetic astaxanthin and improved efficiency of astaxanthin biosynthesis has driven the investigation of metabolic engineering of native astaxanthin producers and heterologous hosts. However, microbial resources for astaxanthin are limited. In this study, we found that the a-Proteobacterium Sphingomonas sp. ATCC 55669 could produce astaxanthin naturally. We used whole-genome sequencing to identify the astaxanthin biosynthetic pathway using a combined PacBio-Illumina approach. The putative astaxanthin biosynthetic pathway in Sphingomonas sp. ATCC 55669 was predicted. For further confirmation, a high-efficiency targeted engineering carotenoid synthesis platform was constructed in E. coli for identifying the functional roles of candidate genes. All genes involved in astaxanthin biosynthesis showed discrete distributions on the chromosome. Moreover, the overexpression of exogenous E. coli idi in Sphingomonas sp. ATCC 55669 increased astaxanthin production by 5.4-fold. This study described a new astaxanthin producer and provided more biosynthesis components for bioengineering of astaxanthin in the future. © 2015 The Authors. Biotechnology Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Auto Tag: Bioinformatics

Complete genome sequence of Spirosoma montaniterrae DY10 T isolated from gamma-ray irradiated soil

Glaucophyta

Tracing origins of the Salmonella Bareilly strain causing a food-borne outbreak in the United States.

Complete chloroplast genome sequence of Fritillaria unibracteata var. wabuensis based on SMRT Sequencing Technology.

Microbial bioinformatics for food safety and production.

Effects of genome structure variation, homeologous genes and repetitive DNA on polyploid crop research in the age of genomics.

Estimating fitness of viral quasispecies from next-generation sequencing data.

HapCol: accurate and memory-efficient haplotype assembly from long reads.

High incidence of invasive group A Streptococcus disease caused by strains of uncommon emm types in Thunder Bay, Ontario, Canada.

Identification and resolution of microdiversity through metagenomic sequencing of parallel consortia.

ALP & FALP: C++ libraries for pairwise local alignment E-values.

Protein O-linked glycosylation in the plant pathogen Ralstonia solanacearum.

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Complete genome sequence of Pandoraea thiooxydans DSM 25325(T), a thiosulfate-oxidizing bacterium.

Genome mining of astaxanthin biosynthetic genes from Sphingomonas sp. ATCC 55669 for heterologous overproduction in Escherichia coli.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert