Bioinformatics Archives - Page 94 of 267

September 22, 2019

MIRU-profiler: a rapid tool for determination of 24-loci MIRU-VNTR profiles from assembled genomes of Mycobacterium tuberculosis.

Tuberculosis (TB) resulted in an estimated 1.7 million deaths in the year 2016. The disease is caused by the members of Mycobacterium tuberculosis complex, which includes Mycobacterium tuberculosis, Mycobacterium bovis and other closely related TB causing organisms. In order to understand the epidemiological dynamics of TB, national TB control programs often conduct standardized genotyping at 24 Mycobacterial-Interspersed-Repetitive-Units (MIRU)-Variable-Number-of-Tandem-Repeats (VNTR) loci. With the advent of next generation sequencing technology, whole-genome sequencing (WGS) has been widely used for studying TB transmission. However, an open-source software that can connect WGS and MIRU-VNTR typing is currently unavailable, which hinders interlaboratory communication. In this manuscript, we introduce the MIRU-profiler program which could be used for prediction of MIRU-VNTR profile from WGS of M. tuberculosis.The MIRU-profiler is implemented in shell scripting language and depends on EMBOSS software. The in-silico workflow of MIRU-profiler is similar to those described in the laboratory manuals for genotyping M. tuberculosis. Given an input genome sequence, the MIRU-profiler computes alleles at the standard 24-loci based on in-silico PCR amplicon lengths. The final output is a tab-delimited text file detailing the 24-loci MIRU-VNTR pattern of the input sequence.The MIRU-profiler was validated on four datasets: complete genomes from NCBI-GenBank (n = 11), complete genomes for locally isolated strains sequenced using PacBio (n = 4), complete genomes for BCG vaccine strains (n = 2) and draft genomes based on 250 bp paired-end Illumina reads (n = 106).The digital MIRU-VNTR results were identical to the experimental genotyping results for complete genomes of locally isolated strains, BCG vaccine strains and five out of 11 genomes from the NCBI-GenBank. For draft genomes based on short Illumina reads, 21 out of 24 loci were inferred with a high accuracy, while a number of inaccuracies were recorded for three specific loci (ETRA, QUB11b and QUB26). One of the unique features of the MIRU-profiler was its ability to process multiple genomes in a batch. This feature was tested on all complete M. tuberculosis genome (n = 157), for which results were successfully obtained in approximately 14 min.The MIRU-profiler is a rapid tool for inference of digital MIRU-VNTR profile from the assembled genome sequences. The tool can accurately infer repeat numbers at the standard 24 or 21/24 MIRU-VNTR loci from the complete or draft genomes respectively. Thus, the tool is expected to bridge the communication gap between the laboratories using WGS and those using the conventional MIRU-VNTR typing.

September 22, 2019

Comparative analysis reveals unexpected genome features of newly isolated Thraustochytrids strains: on ecological function and PUFAs biosynthesis.

Thraustochytrids are unicellular fungal-like marine protists with ubiquitous existence in marine environments. They are well-known for their ability to produce high-valued omega-3 polyunsaturated fatty acids (?-3-PUFAs) (e.g., docosahexaenoic acid (DHA)) and hydrolytic enzymes. Thraustochytrid biomass has been estimated to surpass that of bacterioplankton in both coastal and oceanic waters indicating they have an important role in microbial food-web. Nevertheless, the molecular pathway and regulatory network for PUFAs production and the molecular mechanisms underlying ecological functions of thraustochytrids remain largely unknown.The genomes of two thraustochytrids strains (Mn4 and SW8) with ability to produce DHA were sequenced and assembled with a hybrid sequencing approach utilizing Illumina short paired-end reads and Pacific Biosciences long reads to generate a highly accurate genome assembly. Phylogenomic and comparative genomic analyses found that DHA-producing thraustochytrid strains were highly similar and possessed similar gene content. Analysis of the conventional fatty acid synthesis (FAS) and the polyketide synthase (PKS) systems for PUFAs production only detected incomplete and fragmentary pathways in the genome of these two strains. Surprisingly, secreted carbohydrate active enzymes (CAZymes) were found to be significantly depleted in the genomes of these 2 strains as compared to other sequenced relatives. Furthermore, these two strains possess an expanded gene repertoire for signal transduction and self-propelled movement, which could be important for their adaptations to dynamic marine environments.Our results demonstrate the possibility of a third PUFAs synthesis pathway besides previously described FAS and PKS pathways encoded in the genome of these two thraustochytrid strains. Moreover, lack of a complete set of hydrolytic enzymatic machinery for degrading plant-derived organic materials suggests that these two DHA-producing strains play an important role as a nutritional source rather than a nutrient-producer in marine microbial-food web. Results of this study suggest the existence of two types of saprobic thraustochytrids in the world’s ocean. The first group, which does not produce cellulosic enzymes and live as ‘left-over’ scavenger of bacterioplankton, serves as a dietary source for the plankton of higher trophic levels and the other possesses capacity to live on detrital organic matters in the marine ecosystems.

September 22, 2019

Large-scale gene losses underlie the genome evolution of parasitic plant Cuscuta australis.

Dodders (Cuscuta spp., Convolvulaceae) are root- and leafless parasitic plants. The physiology, ecology, and evolution of these obligate parasites are poorly understood. A high-quality reference genome of Cuscuta australis was assembled. Our analyses reveal that Cuscuta experienced accelerated molecular evolution, and Cuscuta and the convolvulaceous morning glory (Ipomoea) shared a common whole-genome triplication event before their divergence. C. australis genome harbors 19,671 protein-coding genes, and importantly, 11.7% of the conserved orthologs in autotrophic plants are lost in C. australis. Many of these gene loss events likely result from its parasitic lifestyle and the massive changes of its body plan. Moreover, comparison of the gene expression patterns in Cuscuta prehaustoria/haustoria and various tissues of closely related autotrophic plants suggests that Cuscuta haustorium formation requires mostly genes normally involved in root development. The C. australis genome provides important resources for studying the evolution of parasitism, regressive evolution, and evo-devo in plant parasites.

September 22, 2019

Evidence of non-tandemly repeated rDNAs and their intragenomic heterogeneity in Rhizophagus irregularis

Arbuscular mycorrhizal fungus (AMF) species are some of the most widespread symbionts of land plants. Our much improved reference genome assembly of a model AMF, Rhizophagus irregularis DAOM-181602 (total contigs?=?210), facilitated a discovery of repetitive elements with unusual characteristics. R. irregularis has only ten or 11 copies of complete 45S rDNAs, whereas the general eukaryotic genome has tens to thousands of rDNA copies. R. irregularis rDNAs are highly heterogeneous and lack a tandem repeat structure. These findings provide evidence for the hypothesis that rDNA heterogeneity depends on the lack of tandem repeat structures. RNA-Seq analysis confirmed that all rDNA variants are actively transcribed. Observed rDNA/rRNA polymorphisms may modulate translation by using different ribosomes depending on biotic and abiotic interactions. The non-tandem repeat structure and intragenomic heterogeneity of AMF rDNA/rRNA may facilitate successful adaptation to various environmental conditions, increasing host compatibility of these symbiotic fungi.

September 22, 2019

Diversity among blaKPC-containing plasmids in Escherichia coli and other bacterial species isolated from the same patients.

Carbapenem resistant Enterobacteriaceae are a significant public health concern, and genes encoding the Klebsiella pneumoniae carbapenemase (KPC) have contributed to the global spread of carbapenem resistance. In the current study, we used whole-genome sequencing to investigate the diversity of blaKPC-containing plasmids and antimicrobial resistance mechanisms among 26 blaKPC-containing Escherichia coli, and 13 blaKPC-containing Enterobacter asburiae, Enterobacter hormaechei, K. pneumoniae, Klebsiella variicola, Klebsiella michiganensis, and Serratia marcescens strains, which were isolated from the same patients as the blaKPC-containing E. coli. A blaKPC-containing IncN and/or IncFIIK plasmid was identified in 77% (30/39) of the E. coli and other bacterial species analyzed. Complete genome sequencing and comparative analysis of a blaKPC-containing IncN plasmid from one of the E. coli strains demonstrated that this plasmid is present in the K. pneumoniae and S. marcescens strains from this patient, and is conserved among 13 of the E. coli and other bacterial species analyzed. Interestingly, while both IncFIIK and IncN plasmids were prevalent among the strains analyzed, the IncN plasmids were more often identified in multiple bacterial species from the same patients, demonstrating a contribution of this IncN plasmid to the inter-genera dissemination of the blaKPC genes between the E. coli and other bacterial species analyzed.

September 22, 2019

Genotype-Corrector: improved genotype calls for genetic mapping in F2 and RIL populations.

F2 and recombinant inbred lines (RILs) populations are very commonly used in plant genetic mapping studies. Although genome-wide genetic markers like single nucleotide polymorphisms (SNPs) can be readily identified by a wide array of methods, accurate genotype calling remains challenging, especially for heterozygous loci and missing data due to low sequencing coverage per individual. Therefore, we developed Genotype-Corrector, a program that corrects genotype calls and imputes missing data to improve the accuracy of genetic mapping. Genotype-Corrector can be applied in a wide variety of genetic mapping studies that are based on low coverage whole genome sequencing (WGS) or Genotyping-by-Sequencing (GBS) related techniques. Our results show that Genotype-Corrector achieves high accuracy when applied to both synthetic and real genotype data. Compared with using raw or only imputed genotype calls, the linkage groups built by corrected genotype data show much less noise and significant distortions can be corrected. Additionally, Genotype-Corrector compares favorably to the popular imputation software LinkImpute and Beagle in both F2 and RIL populations. Genotype-Corrector is publicly available on GitHub at https://github.com/freemao/Genotype-Corrector .

September 22, 2019

A mosaic monoploid reference sequence for the highly complex genome of sugarcane.

Sugarcane (Saccharum spp.) is a major crop for sugar and bioenergy production. Its highly polyploid, aneuploid, heterozygous, and interspecific genome poses major challenges for producing a reference sequence. We exploited colinearity with sorghum to produce a BAC-based monoploid genome sequence of sugarcane. A minimum tiling path of 4660 sugarcane BAC that best covers the gene-rich part of the sorghum genome was selected based on whole-genome profiling, sequenced, and assembled in a 382-Mb single tiling path of a high-quality sequence. A total of 25,316 protein-coding gene models are predicted, 17% of which display no colinearity with their sorghum orthologs. We show that the two species, S. officinarum and S. spontaneum, involved in modern cultivars differ by their transposable elements and by a few large chromosomal rearrangements, explaining their distinct genome size and distinct basic chromosome numbers while also suggesting that polyploidization arose in both lineages after their divergence.

September 22, 2019

npInv: accurate detection and genotyping of inversions using long read sub-alignment.

Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.

September 22, 2019

Molecular characterization of invasive meningococcal isolates in Burkina Faso as the relative importance of serogroups X and W increases, 2008-2012.

Neisseria meningitidis serogroup A disease in Burkina Faso has greatly decreased following introduction of a meningococcal A conjugate vaccine in 2010, yet other serogroups continue to pose a risk of life-threatening disease. Capsule switching among epidemic-associated serogroup A N. meningitidis strains could allow these lineages to persist despite vaccination. The introduction of new strains at the national or sub-national levels could affect the epidemiology of disease.Isolates collected from invasive meningococcal disease in Burkina Faso between 2008 and 2012 were characterized by serogrouping and molecular typing. Genome sequences from a subset of isolates were used to infer phylogenetic relationships.The ST-5 clonal complex (CC5) was identified only among serogroup A isolates, which were rare after 2010. CC181 and CC11 were the most common clonal complexes after 2010, having serogroup X and W isolates, respectively. Whole-genome phylogenetic analysis showed that the CC181 isolates collected during and after the epidemic of 2010 formed a single clade that was closely related to isolates collected in Niger during 2005 and Burkina Faso during 2007. Geographic population structure was identified among the CC181 isolates, where pairs of isolates collected from the same region of Burkina Faso within a single year had less phylogenetic diversity than the CC181 isolate collection as a whole. However, the reduction of phylogenetic diversity within a region did not extend across multiple years. Instead, CC181 isolates collected during the same year had lower than average diversity, even when collected from different regions, indicating geographic mixing of strains across years. The CC11 isolates were primarily collected during the epidemic of 2012, with sparse sampling during 2011. These isolates belong to a clade that includes previously described isolates collected in Burkina Faso, Mali, and Niger from 2011 to 2015. Similar to CC181, reduced phylogenetic diversity was observed among CC11 isolate pairs collected from the same regions during a single year.The population of disease-associated N. meningitidis strains within Burkina Faso was highly dynamic between 2008 and 2012, reflecting both vaccine-imposed selection against serogroup A strains and potentially complex clonal waves of serogroup X and serogroup W strains.

September 22, 2019

Genome analysis of the ancient tracheophyte Selaginella tamariscina reveals evolutionary features relevant to the acquisition of desiccation tolerance.

Resurrection plants, which are the “gifts” of natural evolution, are ideal models for studying the genetic basis of plant desiccation tolerance. Here, we report a high-quality genome assembly of 301 Mb for the diploid spike moss Selaginella tamariscina, a primitive vascular resurrection plant. We predicated 27 761 protein-coding genes from the assembled S. tamariscina genome, 11.38% (2363) of which showed significant expression changes in response to desiccation. Approximately 60.58% of the S. tamariscina genome was annotated as repetitive DNA, which is an almost 2-fold increase of that in the genome of desiccation-sensitive Selaginella moellendorffii. Genomic and transcriptomic analyses highlight the unique evolution and complex regulations of the desiccation response in S. tamariscina, including species-specific expansion of the oleosin and pentatricopeptide repeat gene families, unique genes and pathways for reactive oxygen species generation and scavenging, and enhanced abscisic acid (ABA) biosynthesis and potentially distinct regulation of ABA signaling and response. Comparative analysis of chloroplast genomes of several Selaginella species revealed a unique structural rearrangement and the complete loss of chloroplast NAD(P)H dehydrogenase (NDH) genes in S. tamariscina, suggesting a link between the absence of the NDH complex and desiccation tolerance. Taken together, our comparative genomic and transcriptomic analyses reveal common and species-specific desiccation tolerance strategies in S. tamariscina, providing significant insights into the desiccation tolerance mechanism and the evolution of resurrection plants. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.

September 22, 2019

Heterogeneous and flexible transmission of mcr-1 in hospital-associated Escherichia coli.

The recent emergence of a transferable colistin resistance mechanism, MCR-1, has gained global attention because of its threat to clinical treatment of infections caused by multidrug-resistant Gram-negative bacteria. However, the possible transmission route of mcr-1 among Enterobacteriaceae species in clinical settings is largely unknown. Here, we present a comprehensive genomic analysis of Escherichia coli isolates collected in a hospital in Hangzhou, China. We found that mcr-1-carrying isolates from clinical infections and feces of inpatients and healthy volunteers were genetically diverse and were not closely related phylogenetically, suggesting that clonal expansion is not involved in the spread of mcr-1 The mcr-1 gene was found on either chromosomes or plasmids, but in most of the E. coli isolates, mcr-1 was carried on plasmids. The genetic context of the plasmids showed considerable diversity as evidenced by the different functional insertion sequence (IS) elements, toxin-antitoxin (TA) systems, heavy metal resistance determinants, and Rep proteins of broad-host-range plasmids. Additionally, the genomic analysis revealed nosocomial transmission of mcr-1 and the coexistence of mcr-1 with other genes encoding ß-lactamases and fluoroquinolone resistance in the E. coli isolates. These findings indicate that mcr-1 is heterogeneously disseminated in both commensal and pathogenic strains of E. coli, suggest the high flexibility of this gene in its association with diverse genetic backgrounds of the hosts, and provide new insights into the genome epidemiology of mcr-1 among hospital-associated E. coli strains. IMPORTANCE Colistin represents one of the very few available drugs for treating infections caused by extensively multidrug-resistant Gram-negative bacteria. The recently emergent mcr-1 colistin resistance gene threatens the clinical utility of colistin and has gained global attention. How mcr-1 spreads in hospital settings remains unknown and was investigated by whole-genome sequencing of mcr-1-carrying Escherichia coli in this study. The findings revealed extraordinary flexibility of mcr-1 in its spread among genetically diverse E. coli hosts and plasmids, nosocomial transmission of mcr-1-carrying E. coli, and the continuous emergence of novel Inc types of plasmids carrying mcr-1 and new mcr-1 variants. Additionally, mcr-1 was found to be frequently associated with other genes encoding ß-lactams and fluoroquinolone resistance. These findings provide important information on the transmission and epidemiology of mcr-1 and are of significant public health importance as the information is expected to facilitate the control of this significant antibiotic resistance threat. Copyright © 2018 Shen et al.

September 22, 2019

Hotspots of independent and multiple rounds of LTR-retrotransposon bursts in Brassica species

Long terminal repeat retrotransposons (LTR-RTs) are a predominant group of plant transposable elements (TEs) that are an important component of plant genomes. A large number of LTR-RTs have been annotated in the genomes of the agronomically important oil and vegetable crops of the genus Brassica. Herein, full-length LTR-RTs in the genomes of Brassica and other closely related species were systematically analyzed. The full-length LTR-RT content varied greatly (from 0.43% to 23.4%) between different species, with Gypsy-like LTR-RTs constituting a primary group across these genomes. More importantly, many annotated LTR-RTs (from 10.03% to 33.25% of all detected LTR-RTs) were found to be enriched in localized hotspot regions. Furthermore, all of the analyzed species showed evidence of having experienced at least one round of a LTR-RT burst, with Raphanus sativus experiencing three or more. Moreover, these relatively ancient LTR-RT amplifications exhibited a clear expansion at specific time points. To gain a further understanding of this timing, Brassica rapa, B. oleracea, and R. sativus were examined for the presence of syntenic regions, but none were present. These findings indicate that these LTR-RT burst events were not inherited from a common ancestor, but instead were species-specific bursts that occurred after the divergence of Brassica species. This study further exemplifies the complexities of TE amplifications during the evolution of plant genomes and suggests that these LTR-RT bursts play an important role in genome expansion and divergence in Brassica species.

September 22, 2019

Strand-seq enables reliable separation of long reads by chromosome via expectation maximization.

Current sequencing technologies are able to produce reads orders of magnitude longer than ever possible before. Such long reads have sparked a new interest in de novo genome assembly, which removes reference biases inherent to re-sequencing approaches and allows for a direct characterization of complex genomic variants. However, even with latest algorithmic advances, assembling a mammalian genome from long error-prone reads incurs a significant computational burden and does not preclude occasional misassemblies. Both problems could potentially be mitigated if assembly could commence for each chromosome separately.To address this, we show how single-cell template strand sequencing (Strand-seq) data can be leveraged for this purpose. We introduce a novel latent variable model and a corresponding Expectation Maximization algorithm, termed SaaRclust, and demonstrates its ability to reliably cluster long reads by chromosome. For each long read, this approach produces a posterior probability distribution over all chromosomes of origin and read directionalities. In this way, it allows to assess the amount of uncertainty inherent to sparse Strand-seq data on the level of individual reads. Among the reads that our algorithm confidently assigns to a chromosome, we observed more than 99% correct assignments on a subset of Pacific Bioscience reads with 30.1×?coverage. To our knowledge, SaaRclust is the first approach for the in silico separation of long reads by chromosome prior to assembly.https://github.com/daewoooo/SaaRclust.

September 22, 2019

GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae.

The chromosomes of many eukaryotes have regions of high GC content interspersed with regions of low GC content. In the yeast Saccharomyces cerevisiae, high-GC regions are often associated with high levels of meiotic recombination. In this study, we constructed URA3 genes that differ substantially in their base composition [URA3-AT (31% GC), URA3-WT (43% GC), and URA3-GC (63% GC)] but encode proteins with the same amino acid sequence. The strain with URA3-GC had an approximately sevenfold elevated rate of ura3 mutations compared with the strains with URA3-WT or URA3-AT About half of these mutations were single-base substitutions and were dependent on the error-prone DNA polymerase ?. About 30% were deletions or duplications between short (5-10 base) direct repeats resulting from DNA polymerase slippage. The URA3-GC gene also had elevated rates of meiotic and mitotic recombination relative to the URA3-AT or URA3-WT genes. Thus, base composition has a substantial effect on the basic parameters of genome stability and evolution. Copyright © 2018 the Author(s). Published by PNAS.

September 22, 2019

Genomic variation among and within six Juglans species.

Genomic analysis in Juglans (walnuts) is expected to transform the breeding and agricultural production of both nuts and lumber. To that end, we report here the determination of reference sequences for six additional relatives of Juglans regia: Juglans sigillata (also from section Dioscaryon), Juglans nigra, Juglans microcarpa, Juglans hindsii (from section Rhysocaryon), Juglans cathayensis (from section Cardiocaryon), and the closely related Pterocarya stenoptera While these are ‘draft’ genomes, ranging in size between 640Mbp and 990Mbp, their contiguities and accuracies can support powerful annotations of genomic variation that are often the foundation of new avenues of research and breeding. We annotated nucleotide divergence and synteny by creating complete pairwise alignments of each reference genome to the remaining six. In addition, we have re-sequenced a sample of accessions from four Juglans species (including regia). The variation discovered in these surveys comprises a critical resource for experimentation and breeding, as well as a solid complementary annotation. To demonstrate the potential of these resources the structural and sequence variation in and around the polyphenol oxidase loci, PPO1 and PPO2 were investigated. As reported for other seed crops variation in this gene is implicated in the domestication of walnuts. The apparently Juglandaceae specific PPO1 duplicate shows accelerated divergence and an excess of amino acid replacement on the lineage leading to accessions of the domesticated nut crop species, Juglans regia and sigillata. Copyright © 2018 Stevens et al.

Auto Tag: Bioinformatics

MIRU-profiler: a rapid tool for determination of 24-loci MIRU-VNTR profiles from assembled genomes of Mycobacterium tuberculosis.

Comparative analysis reveals unexpected genome features of newly isolated Thraustochytrids strains: on ecological function and PUFAs biosynthesis.

Large-scale gene losses underlie the genome evolution of parasitic plant Cuscuta australis.

Evidence of non-tandemly repeated rDNAs and their intragenomic heterogeneity in Rhizophagus irregularis

Diversity among blaKPC-containing plasmids in Escherichia coli and other bacterial species isolated from the same patients.

Genotype-Corrector: improved genotype calls for genetic mapping in F2 and RIL populations.

A mosaic monoploid reference sequence for the highly complex genome of sugarcane.

npInv: accurate detection and genotyping of inversions using long read sub-alignment.

Molecular characterization of invasive meningococcal isolates in Burkina Faso as the relative importance of serogroups X and W increases, 2008-2012.

Genome analysis of the ancient tracheophyte Selaginella tamariscina reveals evolutionary features relevant to the acquisition of desiccation tolerance.

Heterogeneous and flexible transmission of mcr-1 in hospital-associated Escherichia coli.

Hotspots of independent and multiple rounds of LTR-retrotransposon bursts in Brassica species

Strand-seq enables reliable separation of long reads by chromosome via expectation maximization.

GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae.

Genomic variation among and within six Juglans species.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert