Bioinformatics Archives - Page 218 of 267

July 7, 2019

Integrating mass spectrometry and genomics for cyanobacterial metabolite discovery.

Filamentous marine cyanobacteria produce bioactive natural products with both potential therapeutic value and capacity to be harmful to human health. Genome sequencing has revealed that cyanobacteria have the capacity to produce many more secondary metabolites than have been characterized. The biosynthetic pathways that encode cyanobacterial natural products are mostly uncharacterized, and lack of cyanobacterial genetic tools has largely prevented their heterologous expression. Hence, a combination of cutting edge and traditional techniques has been required to elucidate their secondary metabolite biosynthetic pathways. Here, we review the discovery and refined biochemical understanding of the olefin synthase and fatty acid ACP reductase/aldehyde deformylating oxygenase pathways to hydrocarbons, and the curacin A, jamaicamide A, lyngbyabellin, columbamide, and a trans-acyltransferase macrolactone pathway encoding phormidolide. We integrate into this discussion the use of genomics, mass spectrometric networking, biochemical characterization, and isolation and structure elucidation techniques.

July 7, 2019

rHAT: fast alignment of noisy long reads with regional hashing.

Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bot-tleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment.We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput. rHAT is implemented in C++; the source code is available at https://github.com/derekguan/rHAT CONTACT: ydwang@hit.edu.cn. © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

July 7, 2019

Complete genome sequence of deoxynivalenol-degrading bacterium Devosia sp. strain A16.

The strain A16, capable of degrading deoxynivalenol was isolated from a wheat field and identified preliminarily as Devosia sp. Here, we present the genome sequence of the Devosia sp. A16, which has a size of 5,032,994bp, with 4913 coding sequences (CDSs). The annotated full genome sequence of the Devosia sp. A16 strain might shed light on the function of its degradation. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of the Variibacter gotjawalensis GJW-30(T) from soil of lava forest, Gotjawal.

Variibacter gotjawalensis GJW-30(T) is a gram-negative, strictly aerobic bacterium to form pleomorphic. Here we present the 4.5-Mb genome sequence of the type strain of V. gotjawalensis GJW-30(T), which consists a chromosome for the total 4,586,237bp with a G+C content of 62.2mol%. This is the first report of the full genome sequence of a species of the novel genus Variibacter isolated from Gotjawal, a unique area in Jeju, Republic of Korea. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome of Pseudoalteromonas phenolica KCTC 12086(T) (= O-BC30(T)), a marine bacterium producing polybrominated aromatic compounds.

Pseudoalteromonas phenolica is a Gram-negative, rod-shaped, flagellated, aerobic, antibiotic-producing bacterium that was isolated from seawater off Ogasawara Island, Japan. Here, we report the complete genome of P. phenolica KCTC 12086(T) (= O-BC30(T)), which consists of 4,868,993 bp (G+C content of 40.6%) with two chromosomes, 4168 protein-coding genes, 113 tRNAs and 9 rRNA operons. In addition, several genes related to phenolic anti-methicillin-resistant Staphylococcus aureus substances were detected in the genome suggesting that biosynthesis of industrially important polybrominated aromatic compounds could be better understood with the availability of genome data of P. phenolica. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

hybridSPAdes: an algorithm for hybrid assembly of short and long reads.

Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.hybridSPAdes is implemented in C++?as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades CONTACT: d.antipov@spbu.ruSupplementary information: supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Timing, rates and spectra of human germline mutation.

Germline mutations are a driving force behind genome evolution and genetic disease. We investigated genome-wide mutation rates and spectra in multi-sibling families. The mutation rate increased with paternal age in all families, but the number of additional mutations per year differed by more than twofold between families. Meta-analysis of 6,570 mutations showed that germline methylation influences mutation rates. In contrast to somatic mutations, we found remarkable consistency in germline mutation spectra between the sexes and at different paternal ages. In parental germ line, 3.8% of mutations were mosaic, resulting in 1.3% of mutations being shared by siblings. The number of these shared mutations varied significantly between families. Our data suggest that the mutation rate per cell division is higher during both early embryogenesis and differentiation of primordial germ cells but is reduced substantially during post-pubertal spermatogenesis. These findings have important consequences for the recurrence risks of disorders caused by de novo mutations.

July 7, 2019

The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion.

Endemic to the Sino-Himalayan subregion, the medicinal alpine plant Gentiana straminea is a threatened species. The genetic and molecular data about it is deficient. Here we report the complete chloroplast (cp) genome sequence of G. straminea, as the first sequenced member of the family Gentianaceae. The cp genome is 148,991bp in length, including a large single copy (LSC) region of 81,240bp, a small single copy (SSC) region of 17,085bp and a pair of inverted repeats (IRs) of 25,333bp. It contains 112 unique genes, including 78 protein-coding genes, 30 tRNAs and 4 rRNAs. The rps16 gene lacks exon2 between trnK-UUU and trnQ-UUG, which is the first rps16 pseudogene found in the nonparasitic plants of Asterids clade. Sequence analysis revealed the presence of 13 forward repeats, 13 palindrome repeats and 39 simple sequence repeats (SSRs). An entire cp genome comparison study of G. straminea and four other species in Gentianales was carried out. Phylogenetic analyses using maximum likelihood (ML) and maximum parsimony (MP) were performed based on 69 protein-coding genes from 36 species of Asterids. The results strongly supported the position of Gentianaceae as one member of the order Gentianales. The complete chloroplast genome sequence will provide intragenic information for its conservation and contribute to research on the genetic and phylogenetic analyses of Gentianales and Asterids. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of an aromatic compound degrader Arthrobacter sp. YC-RL1.

Arthrobacter sp. YC-RL1, isolated from a petroleum-contaminated soil, is capable of degrading and utilizing a wide range of aromatic compounds for growth. Here we report the complete genome sequence of strain YC-RL1, which may facilitate the investigation of environmental bioremediation and provide new gene resources for biotechnology and gene engineering. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of Agarivorans gilvus WH0801(T), an agarase-producing bacterium isolated from seaweed.

Agarivorans gilvus WH0801(T), an agarase-producing bacterium, was isolated from the surface of seaweed. Here, we present the complete genome sequence, which consists of one circular chromosome of 4,416,600bp with a GC content of 45.9%. This genetic information will provide insight into biotechnological applications of producing agar for food and industry. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

CauloBrowser: A systems biology resource for Caulobacter crescentus.

Caulobacter crescentus is a premier model organism for studying the molecular basis of cellular asymmetry. The Caulobacter community has generated a wealth of high-throughput spatiotemporal databases including data from gene expression profiling experiments (microarrays, RNA-seq, ChIP-seq, ribosome profiling, LC-ms proteomics), gene essentiality studies (Tn-seq), genome wide protein localization studies, and global chromosome methylation analyses (SMRT sequencing). A major challenge involves the integration of these diverse data sets into one comprehensive community resource. To address this need, we have generated CauloBrowser (www.caulobrowser.org), an online resource for Caulobacter studies. This site provides a user-friendly interface for quickly searching genes of interest and downloading genome-wide results. Search results about individual genes are displayed as tables, graphs of time resolved expression profiles, and schematics of protein localization throughout the cell cycle. In addition, the site provides a genome viewer that enables customizable visualization of all published high-throughput genomic data. The depth and diversity of data sets collected by the Caulobacter community makes CauloBrowser a unique and valuable systems biology resource.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica.

Phase variation of the Salmonella enterica opvAB operon generates a bacterial lineage with standard lipopolysaccharide structure (OpvAB(OFF)) and a lineage with shorter O-antigen chains (OpvAB(ON)). Regulation of OpvAB lineage formation is transcriptional, and is controlled by the LysR-type factor OxyR and by DNA adenine methylation. The opvAB regulatory region contains four sites for OxyR binding (OBSA-D), and four methylatable GATC motifs (GATC1-4). OpvAB(OFF) and OpvAB(ON) cell lineages display opposite DNA methylation patterns in the opvAB regulatory region: (i) in the OpvAB(OFF) state, GATC1 and GATC3 are non-methylated, whereas GATC2 and GATC4 are methylated; (ii) in the OpvAB(ON) state, GATC2 and GATC4 are non-methylated, whereas GATC1 and GATC3 are methylated. We provide evidence that such DNA methylation patterns are generated by OxyR binding. The higher stability of the OpvAB(OFF) lineage may be caused by binding of OxyR to sites that are identical to the consensus (OBSA and OBSc), while the sites bound by OxyR in OpvAB(ON) cells (OBSB and OBSD) are not. In support of this view, amelioration of either OBSB or OBSD locks the system in the ON state. We also show that the GATC-binding protein SeqA and the nucleoid protein HU are ancillary factors in opvAB control.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

Analysis of hepatitis C NS5A resistance associated polymorphisms using ultra deep single molecule real time (SMRT) sequencing.

Development of Hepatitis C virus (HCV) resistance against direct-acting antivirals (DAAs), including NS5A inhibitors, is an obstacle to successful treatment of HCV when DAAs are used in sub-optimal combinations. Furthermore, it has been shown that baseline (pre-existing) resistance against DAAs is present in treatment naïve-patients and this will potentially complicate future treatment strategies in different HCV genotypes (GTs). Thus the aim was to detect low levels of NS5A resistant associated variants (RAVs) in a limited sample set of treatment-naïve patients of HCV GT1a and 3a, since such polymorphisms can display in vitro resistance as high as 60000 fold. Ultra-deep single molecule real time (SMRT) sequencing with the Pacific Biosciences (PacBio) RSII instrument was used to detect these RAVs. The SMRT sequencing was conducted on ten samples; three of them positive with Sanger sequencing (GT1a Q30H and Y93N, and GT3a Y93H), five GT1a samples, and two GT3a non-positive samples. The same methods were applied to the HCV GT1a H77-plasmid in a dilution series, in order to determine the error rates of replication, which in turn was used to determine the limit of detection (LOD), as defined by mean + 3SD, of minority variants down to 0.24%. We found important baseline NS5A RAVs at levels between 0.24 and 0.5%, which could potentially have clinical relevance. This new method with low level detection of baseline RAVs could be useful in predicting the most cost-efficient combination of DAA treatment, and reduce the treatment duration for an HCV infected individual. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

MuffinEc: Error correction for de novo assembly via greedy partitioning and sequence alignment

Error correction is typically the first step of de novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net.

July 7, 2019

Long read and single molecule DNA sequencing simplifies genome assembly and TAL effector gene analysis of Xanthomonas translucens.

The species Xanthomonas translucens encompasses a complex of bacterial strains that cause diseases and yield loss on grass species including important cereal crops. Three pathovars, X. translucens pv. undulosa, X. translucens pv. translucens and X. translucens pv.cerealis, have been described as pathogens of wheat, barley, and oats. However, no complete genome sequence for a strain of this complex is currently available.A complete genome sequence of X. translucens pv. undulosa strain XT4699 was obtained by using PacBio long read, single molecule, real time (SMRT) DNA sequences and Illumina sequences. Draft genome sequences of nineteen additional X. translucens strains, which were collected from wheat or barley in different regions and at different times, were generated by Illumina sequencing. Phylogenetic relationships among different Xanthomonas strains indicates that X. translucens are members of a distinct clade from so-called group 2 xanthomonads and three pathovars of this species, undulosa, translucens and cerealis, represent distinct subclades in the group 1 clade. Knockout mutation of type III secretion system of XT4699 eliminated the ability to cause water-soaking symptoms on wheat and barley and resulted in a reduction in populations on wheat in comparison to the wild type strain. Sequence comparison of X. translucens strains revealed the genetic variation on type III effector repertories among different pathovars or within one pathovar. The full genome sequence of XT4699 reveals the presence of eight members of the Transcription-Activator Like (TAL) effector genes, which are phylogenetically distant from previous known TAL effector genes of group 2 xanthomonads. Microarray and qRT-PCR analyses revealed TAL effector-specific wheat gene expression modulation.PacBio long read sequencing facilitates the assembly of Xanthomonas genomes and the multiple TAL effector genes, which are difficult to assemble from short read platforms. The complete genome sequence of X. translucens pv. undulosa strain XT4699 and draft genome sequences of nineteen additional X. translucens strains provides a resource for further genetic analyses of pathogenic diversity and host range of the X. translucens species complex. TAL effectors of XT4699 strain play roles in modulating wheat host gene expressions.

Auto Tag: Bioinformatics

Integrating mass spectrometry and genomics for cyanobacterial metabolite discovery.

rHAT: fast alignment of noisy long reads with regional hashing.

Complete genome sequence of deoxynivalenol-degrading bacterium Devosia sp. strain A16.

Complete genome sequence of the Variibacter gotjawalensis GJW-30(T) from soil of lava forest, Gotjawal.

Complete genome of Pseudoalteromonas phenolica KCTC 12086(T) (= O-BC30(T)), a marine bacterium producing polybrominated aromatic compounds.

hybridSPAdes: an algorithm for hybrid assembly of short and long reads.

Timing, rates and spectra of human germline mutation.

The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion.

Complete genome sequence of an aromatic compound degrader Arthrobacter sp. YC-RL1.

Complete genome sequence of Agarivorans gilvus WH0801(T), an agarase-producing bacterium isolated from seaweed.

CauloBrowser: A systems biology resource for Caulobacter crescentus.

OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica.

Analysis of hepatitis C NS5A resistance associated polymorphisms using ultra deep single molecule real time (SMRT) sequencing.

MuffinEc: Error correction for de novo assembly via greedy partitioning and sequence alignment

Long read and single molecule DNA sequencing simplifies genome assembly and TAL effector gene analysis of Xanthomonas translucens.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert