Bioinformatics Archives - Page 236 of 267

July 7, 2019

DBG2OLC: Efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies.

The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost.

July 7, 2019

CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.

July 7, 2019

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

The bacterial genus Endozoicomonas was commonly detected in healthy corals in many coral-associated bacteria studies in the past decade. Although, it is likely to be a core member of coral microbiota, little is known about its ecological roles. To decipher potential interactions between bacteria and their coral hosts, we sequenced and investigated the first culturable endozoicomonal bacterium from coral, the E. montiporae CL-33(T). Its genome had potential sign of ongoing genome erosion and gene exchange with its host. Testosterone degradation and type III secretion system are commonly present in Endozoicomonas and may have roles to recognize and deliver effectors to their hosts. Moreover, genes of eukaryotic ephrin ligand B2 are present in its genome; presumably, this bacterium could move into coral cells via endocytosis after binding to coral’s Eph receptors. In addition, 7,8-dihydro-8-oxoguanine triphosphatase and isocitrate lyase are possible type III secretion effectors that might help coral to prevent mitochondrial dysfunction and promote gluconeogenesis, especially under stress conditions. Based on all these findings, we inferred that E. montiporae was a facultative endosymbiont that can recognize, translocate, communicate and modulate its coral host.

July 7, 2019

Complete genome sequence of thermophilic Bacillus smithii type strain DSM 4216(T).

Bacillus smithii is a facultatively anaerobic, thermophilic bacterium able to use a variety of sugars that can be derived from lignocellulosic feedstocks. Being genetically accessible, it is a potential new host for biotechnological production of green chemicals from renewable resources. We determined the complete genomic sequence of the B. smithii type strain DSM 4216(T), which consists of a 3,368,778 bp chromosome (GenBank accession number CP012024.1) and a 12,514 bp plasmid (GenBank accession number CP012025.1), together encoding 3880 genes. Genome annotation via RAST was complemented by a protein domain analysis. Some unique features of B. smithii central metabolism in comparison to related organisms included the lack of a standard acetate production pathway with no apparent pyruvate formate lyase, phosphotransacetylase, and acetate kinase genes, while acetate was the second fermentation product.

July 7, 2019

Complete genome sequence of a Rhodococcus species isolated from the winter skate Leucoraja ocellata.

We report here a genome sequence for Rhodococcus sp. isolate UM008 isolated from the renal/interrenal tissue of the winter skate Leucoraja ocellata Genome sequence analysis suggests that Rhodococcus bacteria may act in a novel mutualistic relationship with their elasmobranch host, serving as biocatalysts in the steroidogenic pathway of 1a-hydroxycorticosterone. Copyright © 2016 Wiens et al.

July 7, 2019

Comparative genomic analysis of isoproturon-mineralizing sphingomonads reveals the isoproturon catabolic mechanism.

The worldwide use of the phenylurea herbicide, isoproturon (IPU), has resulted in considerable concern about its environmental fate. Although many microbial metabolites of IPU are known and IPU-mineralizing bacteria have been isolated, the molecular mechanism of IPU catabolism has not been elucidated yet. In this study, complete genes that encode the conserved IPU catabolic pathway were revealed, based on comparative analysis of the genomes of three IPU-mineralizing sphingomonads and subsequent experimental validation. The complete genes included a novel hydrolase gene ddhA, which is responsible for the cleavage of the urea side chain of the IPU demethylated products; a distinct aniline dioxygenase gene cluster adoQTA1A2BR, which has a broad substrate range; and an inducible catechol meta-cleavage pathway gene cluster adoXEGKLIJC. Furthermore, the initial mono-N-demethylation genes pdmAB were further confirmed to be involved in the successive N-demethylation of the IPU mono-N-demethylated product. These IPU-catabolic genes were organized into four transcription units and distributed on three plasmids. They were flanked by multiple mobile genetic elements and highly conserved among IPU-mineralizing sphingomonads. The elucidation of the molecular mechanism of IPU catabolism will enhance our understanding of the microbial mineralization of IPU and provide insights into the evolutionary scenario of the conserved IPU-catabolic pathway. © 2016 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.

July 7, 2019

Evidence for an opportunistic and endophytic lifestyle of the Bursaphelenchus xylophilus-associated bacteria Serratia marcescens PWN146 isolated from wilting Pinus pinaster.

Pine wilt disease (PWD) results from the interaction of three elements: the pathogenic nematode, Bursaphelenchus xylophilus; the insect-vector, Monochamus sp.; and the host tree, mostly Pinus species. Bacteria isolated from B. xylophilus may be a fourth element in this complex disease. However, the precise role of bacteria in this interaction is unclear as both plant-beneficial and as plant-pathogenic bacteria may be associated with PWD. Using whole genome sequencing and phenotypic characterization, we were able to investigate in more detail the genetic repertoire of Serratia marcescens PWN146, a bacterium associated with B. xylophilus. We show clear evidence that S. marcescens PWN146 is able to withstand and colonize the plant environment, without having any deleterious effects towards a susceptible host (Pinus thunbergii), B. xylophilus nor to the nematode model C. elegans. This bacterium is able to tolerate growth in presence of xenobiotic/organic compounds, and use phenylacetic acid as carbon source. Furthermore, we present a detailed list of S. marcescens PWN146 potentials to interfere with plant metabolism via hormonal pathways and/or nutritional acquisition, and to be competitive against other bacteria and/or fungi in terms of resource acquisition or production of antimicrobial compounds. Further investigation is required to understand the role of bacteria in PWD. We have now reinforced the theory that B. xylophilus-associated bacteria may have a plant origin.

July 7, 2019

Comparative genomics and transcriptomics of Pichia pastoris.

Pichia pastoris has emerged as an important alternative host for producing recombinant biopharmaceuticals, owing to its high cultivation density, low host cell protein burden, and the development of strains with humanized glycosylation. Despite its demonstrated utility, relatively little strain engineering has been performed to improve Pichia, due in part to the limited number and inconsistent frameworks of reported genomes and transcriptomes. Furthermore, the co-mingling of genomic, transcriptomic and fermentation data collected about Komagataella pastoris and Komagataella phaffii, the two strains co-branded as Pichia, has generated confusion about host performance for these genetically distinct species. Generation of comparative high-quality genomes and transcriptomes will enable meaningful comparisons between the organisms, and potentially inform distinct biotechnological utilies for each species.Here, we present a comprehensive and standardized comparative analysis of the genomic features of the three most commonly used strains comprising the tradename Pichia: K. pastoris wild-type, K. phaffii wild-type, and K. phaffii GS115. We used a combination of long-read (PacBio) and short-read (Illumina) sequencing technologies to achieve over 1000X coverage of each genome. Construction of individual genomes was then performed using as few as seven individual contigs to create gap-free assemblies. We found substantial syntenic rearrangements between the species and characterized a linear plasmid present in K. phaffii. Comparative analyses between K. phaffii genomes enabled the characterization of the mutational landscape of the GS115 strain. We identified and examined 35 non-synonomous coding mutations present in GS115, many of which are likely to impact strain performance. Additionally, we investigated transcriptomic profiles of gene expression for both species during cultivation on various carbon sources. We observed that the most highly transcribed genes in both organisms were consistently highly expressed in all three carbon sources examined. We also observed selective expression of certain genes in each carbon source, including many sequences not previously reported as promoters for expression of heterologous proteins in yeasts.Our studies establish a foundation for understanding critical relationships between genome structure, cultivation conditions and gene expression. The resources we report here will inform and facilitate rational, organism-wide strain engineering for improved utility as a host for protein production.

July 7, 2019

Complete Genome Sequence of Mycobacterium avium, Isolated from Commercial Domestic Pekin Ducks (Anas platyrhynchos domestica), Determined Using PacBio Single-Molecule Real-Time Technology

Mycobacterium avium is an important pathogenic bacterium in birds and has never, to our knowledge, reported to be isolated from domestic ducks. We present here the complete genome sequence of a virulent strain of Mycobacterium avium, isolated from domestic Pekin ducks for the first time, which was determined by PacBio single-molecule real-time technology. Copyright © 2016 Song et al.

July 7, 2019

Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences.

Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules.With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory.The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

July 7, 2019

Identification of the novel B*27:144 allele in an Irish Individual.

July 7, 2019

Draft genome sequence of Mycobacterium rufum JS14(T), a polycyclic-aromatic-hydrocarbon-degrading bacterium from petroleum-contaminated soil in Hawaii.

Mycobacterium rufum JS14(T) (=ATCC BAA-1377(T), CIP 109273(T), JCM 16372(T), DSM 45406(T)), a type strain of the species Mycobacterium rufum sp. . belonging to the family Mycobacteriaceae, was isolated from polycyclic aromatic hydrocarbon (PAH)-contaminated soil in Hilo (HI, USA) because it harbors the capability of degrading PAH. Here, we describe the first genome sequence of strain JS14(T), with brief phenotypic characteristics. The genome is composed of 6,176,413 bp with 69.25 % G?+?C content and contains 5810 protein-coding genes with 54 RNA genes. The genome information on M. rufum JS14(T) will provide a better understanding of the complexity of bacterial catabolic pathways for degradation of specific chemicals.

July 7, 2019

The report of my death was an exaggeration: A review for researchers using microsatellites in the 21st century.

Microsatellites, or simple sequence repeats (SSRs), have long played a major role in genetic studies due to their typically high polymorphism. They have diverse applications, including genome mapping, forensics, ascertaining parentage, population and conservation genetics, identification of the parentage of polyploids, and phylogeography. We compare SSRs and newer methods, such as genotyping by sequencing (GBS) and restriction site associated DNA sequencing (RAD-Seq), and offer recommendations for researchers considering which genetic markers to use. We also review the variety of techniques currently used for identifying microsatellite loci and developing primers, with a particular focus on those that make use of next-generation sequencing (NGS). Additionally, we review software for microsatellite development and report on an experiment to assess the utility of currently available software for SSR development. Finally, we discuss the future of microsatellites and make recommendations for researchers preparing to use microsatellites. We argue that microsatellites still have an important place in the genomic age as they remain effective and cost-efficient markers.

July 7, 2019

High quality draft genome sequence of the type strain of Pseudomonas lutea OK2(T), a phosphate-solubilizing rhizospheric bacterium.

Pseudomonas lutea OK2(T) (=LMG 21974(T), CECT 5822(T)) is the type strain of the species and was isolated from the rhizosphere of grass growing in Spain in 2003 based on its phosphate-solubilizing capacity. In order to identify the functional significance of phosphate solubilization in Pseudomonas Plant growth promoting rhizobacteria, we describe here the phenotypic characteristics of strain OK2(T) along with its high-quality draft genome sequence, its annotation, and analysis. The genome is comprised of 5,647,497 bp with 60.15 % G?+?C content. The sequence includes 4,846 protein-coding genes and 95 RNA genes.

July 7, 2019

Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica).

Domesticated apple (Malus?×?domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9?×?genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses.Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102?×?genome coverage) Illumina HiSeq data and 21.7 Gb (~29?×?genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing?~?90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes.The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.

Auto Tag: Bioinformatics

DBG2OLC: Efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies.

CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

Complete genome sequence of thermophilic Bacillus smithii type strain DSM 4216(T).

Complete genome sequence of a Rhodococcus species isolated from the winter skate Leucoraja ocellata.

Comparative genomic analysis of isoproturon-mineralizing sphingomonads reveals the isoproturon catabolic mechanism.

Evidence for an opportunistic and endophytic lifestyle of the Bursaphelenchus xylophilus-associated bacteria Serratia marcescens PWN146 isolated from wilting Pinus pinaster.

Comparative genomics and transcriptomics of Pichia pastoris.

Complete Genome Sequence of Mycobacterium avium, Isolated from Commercial Domestic Pekin Ducks (Anas platyrhynchos domestica), Determined Using PacBio Single-Molecule Real-Time Technology

Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences.

Identification of the novel B*27:144 allele in an Irish Individual.

Draft genome sequence of Mycobacterium rufum JS14(T), a polycyclic-aromatic-hydrocarbon-degrading bacterium from petroleum-contaminated soil in Hawaii.

The report of my death was an exaggeration: A review for researchers using microsatellites in the 21st century.

High quality draft genome sequence of the type strain of Pseudomonas lutea OK2(T), a phosphate-solubilizing rhizospheric bacterium.

Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica).

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert