Bioinformatics Archives - Page 215 of 267

July 7, 2019

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.© The Author 2017. Published by Oxford University Press.

July 7, 2019

Trajectories and drivers of genome evolution in surface-associated marine Phaeobacter.

The extent of genome divergence and the evolutionary events leading to speciation of marine bacteria have mostly been studied for (locally) abundant, free-living groups. The genus Phaeobacter is found on different marine surfaces, seems to occupy geographically disjunct habitats, and is involved in different biotic interactions, and was therefore targeted in the present study. The analysis of the chromosomes of 32 closely related but geographically spread Phaeobacter strains revealed an exceptionally large, highly syntenic core genome. The flexible gene pool is constantly but slightly expanding across all Phaeobacter lineages. The horizontally transferred genes mostly originated from bacteria of the Roseobacter group and horizontal transfer most likely was mediated by gene transfer agents. No evidence for geographic isolation and habitat specificity of the different phylogenomic Phaeobacter clades was detected based on the sources of isolation. In contrast, the functional gene repertoire and physiological traits of different phylogenomic Phaeobacter clades were sufficiently distinct to suggest an adaptation to an associated lifestyle with algae, to additional nutrient sources, or toxic heavy metals. Our study reveals that the evolutionary trajectories of surface-associated marine bacteria can differ significantly from free-living marine bacteria or marine generalists.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

Complete genome sequence of the new urolithin-producing bacterium Gordonibacter urolithinfaciens DSM 27213T.

Gordonibacter urolithinfaciens DSM 27213T was isolated from human feces and is able to metabolize ellagic acid (a dietary phenolic compound present in various fruits) to urolithins. Here, we report the finished and annotated genome sequence of this organism.

July 7, 2019

The completed PacBio single-molecule real-time sequence of Methylosinus trichosporium strain OB3b reveals the presence of a third large plasmid.

Presented here is the complete genome sequence of the well-studiedRhizobialesmethanotrophMethylosinus trichosporiumstrain OB3b. The assembly contains 5,183,433 bp, corresponding to a chromosome of 4,508,832 bp and three circular plasmids of 285,280 bp, 209,102 bp, and 180,219 bp. Copyright © 2017 Heil et al.

July 7, 2019

Draft genome sequence of Lactobacillus salivarius SGL 03, a novel potential probiotic strain.

In this work, we report the draft genome sequence ofLactobacillus salivariusSGL 03, a novel potential probiotic strain isolated from healthy infant stools. Antibiotic resistance analysis revealed the presence of a tetracycline resistance gene without elements potentially responsible for interspecific horizontal gene transfer. Copyright © 2017 Federici et al.

July 7, 2019

Complete genome sequence of Bacillus altitudinis P-10, a potential bioprotectant against Xanthomonas oryzae pv. oryzae, isolated from rice rhizosphere in Java, Indonesia.

Bacillus altitudinis P-10 was isolated from the rhizosphere of rice grown in an organic rice field and provides strong antagonism against the bacterial blight caused by Xanthomonas oryzae pv. oryzae in rice. Herein, we provide the complete genome sequence and a possible explanation of the antibiotic function of the P-10 strain.

July 7, 2019

Complete chromosome sequence of a mycolactone-producing mycobacterium, Mycobacterium pseudoshottsii.

Mycobacterium pseudoshottsii is a fish pathogen that produces mycolactone. Here, we report the complete chromosome sequence of a type strain ofM. pseudoshottsii(JCM 15466). The sequence will represent essential data for future phylogenetic and comparative genome studies of mycolactone-producing mycobacteria. Copyright © 2017 Yoshida et al.

July 7, 2019

Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon in TAF1.

X-linked dystonia-parkinsonism (XDP) is a neurodegenerative disease associated with an antisense insertion of a SINE-VNTR-Alu (SVA)-type retrotransposon within an intron ofTAF1This unique insertion coincides with six additional noncoding sequence changes inTAF1, the gene that encodes TATA-binding protein-associated factor-1, which appear to be inherited together as an identical haplotype in all reported cases. Here we examined the sequence of this SVA in XDP patients (n= 140) and detected polymorphic variation in the length of a hexanucleotide repeat domain, (CCCTCT)nThe number of repeats in these cases ranged from 35 to 52 and showed a highly significant inverse correlation with age at disease onset. Because other SVAs exhibit intrinsic promoter activity that depends in part on the hexameric domain, we assayed the transcriptional regulatory effects of varying hexameric lengths found in the unique XDP SVA retrotransposon using luciferase reporter constructs. When inserted sense or antisense to the luciferase reading frame, the XDP variants repressed or enhanced transcription, respectively, to an extent that appeared to vary with length of the hexamer. Further in silico analysis of this SVA sequence revealed multiple motifs predicted to form G-quadruplexes, with the greatest potential detected for the hexameric repeat domain. These data directly link sequence variation within the XDP-specific SVA sequence to phenotypic variability in clinical disease manifestation and provide insight into potential mechanisms by which this intronic retroelement may induce transcriptional interference inTAF1expression. Copyright © 2017 the Author(s). Published by PNAS.

July 7, 2019

Sex-specific influences of mtDNA mitotype and diet on mitochondrial functions and physiological traits in Drosophila melanogaster.

Here we determine the sex-specific influence of mtDNA type (mitotype) and diet on mitochondrial functions and physiology in two Drosophila melanogaster lines. In many species, males and females differ in aspects of their energy production. These sex-specific influences may be caused by differences in evolutionary history and physiological functions. We predicted the influence of mtDNA mutations should be stronger in males than females as a result of the organelle’s maternal mode of inheritance in the majority of metazoans. In contrast, we predicted the influence of diet would be greater in females due to higher metabolic flexibility. We included four diets that differed in their protein: carbohydrate (P:C) ratios as they are the two-major energy-yielding macronutrients in the fly diet. We assayed four mitochondrial function traits (Complex I oxidative phosphorylation, reactive oxygen species production, superoxide dismutase activity, and mtDNA copy number) and four physiological traits (fecundity, longevity, lipid content, and starvation resistance). Traits were assayed at 11 d and 25 d of age. Consistent with predictions we observe that the mitotype influenced males more than females supporting the hypothesis of a sex-specific selective sieve in the mitochondrial genome caused by the maternal inheritance of mitochondria. Also, consistent with predictions, we found that the diet influenced females more than males.

July 7, 2019

On the importance of homology in the age of phylogenomics

Homology is perhaps the most central concept of phylogenetic biology. Molecular systematists have traditionally paid due attention to the homology statements that are implied by their alignments of orthologous sequences, but some authors have suggested that manual gene-by-gene curation is not sustainable in the phylogenomics era. Here, we show that there are multiple ways to efficiently screen for and detect homology errors in phylogenomic data sets. Application of these screening approaches to two phylogenomic data sets, one for birds and another for mammals, shows that these data are replete with homology errors including alignments of different exons to each other, alignments of exons to introns, and alignments of paralogues to each other. The extent of these homology errors weakens the conclusions of studies based on these data sets. Despite advances in automated phylogenomic pipelines, we contend that much of the long, difficult, and sometimes tedious work of systematics is still required to guard against pervasive homology errors. This conclusion is underscored by recent studies that show that just a few outlier genes can impact phylogenetic results at short, tightly spaced internodes that are deep in the Tree of Life. The view that widespread DNA sequence alignment errors are not a major concern for rigorous systematic research is not tenable. If a primary goal of phylogenomics is to resolve the most challenging phylogenetic problems with the abundant data that are now available, researchers must employ effective procedures to screen for and correct homology errors prior to performing downstream phylogenetic analyses.

July 7, 2019

COSINE: non-seeding method for mapping long noisy sequences.

Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

Comparative whole-genomic analysis of an ancient L2 lineage Mycobacterium novel phylogenetic clade and common genetic determinants of hypervirulent strains.

Background: Development of improved therapeutics against tuberculosis (TB) is hindered by an inadequate understanding of the relationship between disease severity and genetic diversity of its causative agent, Mycobacterium tuberculosis. We previously isolated a hypervirulent M. tuberculosis strain H112 from an HIV-negative patient with an aggressive disease progression from pulmonary TB to tuberculous meningitis—the most severe manifestation of tuberculosis. Human macrophage challenge experiment demonstrated that the strain H112 exhibited significantly better intracellular survivability and induced lower level of TNF-a than the reference virulent strain H37Rv and other 123 clinical isolates. Aim: The present study aimed to identify the potential genetic determinants of mycobacterial virulence that were common to strain H112 and hypervirulent M. tuberculosis strains of the same phylogenetic clade isolated in other global regions. Methods: A low-virulent M. tuberculosis strain H54 which belonged to the same phylogenetic lineage (L2) as strain H112 was selected from a collection of 115 clinical isolates. Both H112 and H54 were whole-genome-sequenced using PacBio sequencing technology. A comparative genomics approach was adopted to identify mutations present in strain H112 but absent in strain H54. Subsequently, an extensive phylogenetic analysis was conducted by including all publically available M. tuberculosis genomes. Single-nucleotide-polymorphisms (SNPs) and structural variations (SVs) common to hypervirulent strains in the global collection of genomes were considered as potential genetic determinants of hypervirulence. Results: Sequencing data revealed that both H112 and H54 were identified as members of the same sub-lineage L2.2.1. After excluding the lineage-related mutations shared between H112 and H54, we analyzed the phylogenetic relatedness of H112 with global collection of M. tuberculosis genomes (n = 4,338), and identified a novel phylogenetic clade in which four hypervirulent strains isolated from geographically diverse regions were clustered together. All hypervirulent strains in the clade shared 12 SNPs and 5 SVs with H112, including those affecting key virulence-associated loci, notably, a deleterious SNP (rv0178 p. D150E) within mce1 operon and an intergenic deletion (854259_ 854261delCC) in close-proximity to phoP. Conclusion: The present study identified common genetic factors in a novel phylogenetic clade of hypervirulent M. tuberculosis. The causative role of these mutations in mycobacterial virulence should be validated in future study.

July 7, 2019

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read-based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.© The Authors 2017. Published by Oxford University Press.

July 7, 2019

Nitrogen fixation genes and nitrogenase activity of the non-heterocystous cyanobacterium Thermoleptolyngbya sp. O-77.

Cyanobacteria are widely distributed in marine, aquatic, and terrestrial ecosystems, and play an important role in the global nitrogen cycle. In the present study, we examined the genome sequence of the thermophilic non-heterocystous N2-fixing cyanobacterium, Thermoleptolyngbya sp. O-77 (formerly known as Leptolyngbya sp. O-77) and characterized its nitrogenase activity. The genome of this cyanobacterial strain O-77 consists of a single chromosome containing a nitrogen fixation gene cluster. A phylogenetic analysis indicated that the NifH amino acid sequence from strain O-77 was clustered with those from a group of mesophilic species: the highest identity was found in Leptolyngbya sp. KIOST-1 (97.9% sequence identity). The nitrogenase activity of O-77 cells was dependent on illumination, whereas a high intensity of light of 40 µmol m-2 s-1 suppressed the effects of illumination.

July 7, 2019

The draft genome sequence of Pectobacterium carotovorum subsp. actinidiae KKH3 that infects kiwi plant and potential bioconversion applications

Pectobacterium carotovorum subsp. actinidiae KKH3 is an Enterobacteriaceae bacterial pathogen that infects kiwi plants, causing canker-like symptoms that pose a threat to the kiwifruit industry. Because the strain was originally isolated from woody plants and possesses numerous plant cell wall-degrading enzymes, this draft genome report provides insight into possible bioconversion applications, as well as a better understanding of this important plant pathogen.

Auto Tag: Bioinformatics

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

Trajectories and drivers of genome evolution in surface-associated marine Phaeobacter.

Complete genome sequence of the new urolithin-producing bacterium Gordonibacter urolithinfaciens DSM 27213T.

The completed PacBio single-molecule real-time sequence of Methylosinus trichosporium strain OB3b reveals the presence of a third large plasmid.

Draft genome sequence of Lactobacillus salivarius SGL 03, a novel potential probiotic strain.

Complete genome sequence of Bacillus altitudinis P-10, a potential bioprotectant against Xanthomonas oryzae pv. oryzae, isolated from rice rhizosphere in Java, Indonesia.

Complete chromosome sequence of a mycolactone-producing mycobacterium, Mycobacterium pseudoshottsii.

Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon in TAF1.

Sex-specific influences of mtDNA mitotype and diet on mitochondrial functions and physiological traits in Drosophila melanogaster.

On the importance of homology in the age of phylogenomics

COSINE: non-seeding method for mapping long noisy sequences.

Comparative whole-genomic analysis of an ancient L2 lineage Mycobacterium novel phylogenetic clade and common genetic determinants of hypervirulent strains.

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Nitrogen fixation genes and nitrogenase activity of the non-heterocystous cyanobacterium Thermoleptolyngbya sp. O-77.

The draft genome sequence of Pectobacterium carotovorum subsp. actinidiae KKH3 that infects kiwi plant and potential bioconversion applications

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert