Bioinformatics Archives - Page 162 of 267

July 7, 2019

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms.Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible.FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step.FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge datasets for detecting genetic engineering toolmarks, etc.

July 7, 2019

vanG element insertions within a conserved chromosomal site conferring vancomycin resistance to Streptococcus agalactiae and Streptococcus anginosus.

Three vancomycin-resistant streptococcal strains carrying vanG elements (two invasive Streptococcus agalactiae isolates [GBS-NY and GBS-NM, both serotype II and multilocus sequence type 22] and one Streptococcus anginosus [Sa]) were examined. The 45,585-bp elements found within Sa and GBS-NY were nearly identical (together designated vanG-1) and shared near-identity over an ~15-kb overlap with a previously described vanG element from Enterococcus faecalis. Unexpectedly, vanG-1 shared much less homology with the 49,321-bp vanG-2 element from GBS-NM, with widely different levels (50% to 99%) of sequence identity shared among 44 related open reading frames. Immediately adjacent to both vanG-1 and vanG-2 were 44,670-bp and 44,680-bp integrative conjugative element (ICE)-like sequences, designated ICE-r, that were nearly identical in the two group B streptococcal (GBS) strains. The dual vanG and ICE-r elements from both GBS strains were inserted at the same position, between bases 1328 and 1329, within the identical RNA methyltransferase (rumA) genes. A GenBank search revealed that although most GBS strains contained insertions within this specific site, only sequence type 22 (ST22) GBS strains contained highly related ICE-r derivatives. The vanG-1 element in Sa was also inserted within this position corresponding to its rumA homolog adjacent to an ICE-r derivative. vanG-1 insertions were previously reported within the same relative position in the E. faecalis rumA homolog. An ICE-r sequence perfectly conserved with respect to its counterpart in GBS-NY was apparent within the same site of the rumA homolog of a Streptococcus dysgalactiae subsp. equisimilis strain. Additionally, homologous vanG-like elements within the conserved rumA target site were evident in Roseburia intestinalis. Importance: These three streptococcal strains represent the first known vancomycin-resistant strains of their species. The collective observations made from these strains reveal a specific hot spot for insertional elements that is conserved between streptococci and different Gram-positive species. The two GBS strains potentially represent a GBS lineage that is predisposed to insertion of vanG elements. Copyright © 2014 Srinivasan et al.

July 7, 2019

Site-specific genetic engineering of the Anopheles gambiae Y chromosome.

Despite its function in sex determination and its role in driving genome evolution, the Y chromosome remains poorly understood in most species. Y chromosomes are gene-poor, repeat-rich and largely heterochromatic and therefore represent a difficult target for genetic engineering. The Y chromosome of the human malaria vector Anopheles gambiae appears to be involved in sex determination although very little is known about both its structure and function. Here, we characterize a transgenic strain of this mosquito species, obtained by transposon-mediated integration of a transgene construct onto the Y chromosome. Using meganuclease-induced homologous repair we introduce a site-specific recombination signal onto the Y chromosome and show that the resulting docking line can be used for secondary integration. To demonstrate its utility, we study the activity of a germ-line-specific promoter when located on the Y chromosome. We also show that Y-linked fluorescent transgenes allow automated sex separation of this important vector species, providing the means to generate large single-sex populations. Our findings will aid studies of sex chromosome function and enable the development of male-exclusive genetic traits for vector control.

July 7, 2019

Genomic reconnaissance of clinical isolates of emerging human pathogen Mycobacterium abscessus reveals high evolutionary potential.

Mycobacterium abscessus (Ma) is an emerging human pathogen that causes both soft tissue infections and systemic disease. We present the first comparative whole-genome study of Ma strains isolated from patients of wide geographical origin. We found a high proportion of accessory strain-specific genes indicating an open, non-conservative pan-genome structure, and clear evidence of rapid phage-mediated evolution. Although we found fewer virulence factors in Ma compared to M. tuberculosis, our data indicated that Ma evolves rapidly and therefore should be monitored closely for the acquisition of more pathogenic traits. This comparative study provides a better understanding of Ma and forms the basis for future functional work on this important pathogen.

July 7, 2019

Molecular and biological characterization of a new isolate of guinea pig cytomegalovirus.

Development of a vaccine against congenital infection with human cytomegalovirus is complicated by the issue of re-infection, with subsequent vertical transmission, in women with pre-conception immunity to the virus. The study of experimental therapeutic prevention of re-infection would ideally be undertaken in a small animal model, such as the guinea pig cytomegalovirus (GPCMV) model, prior to human clinical trials. However, the ability to model re-infection in the GPCMV model has been limited by availability of only one strain of virus, the 22122 strain, isolated in 1957. In this report, we describe the isolation of a new GPCMV strain, the CIDMTR strain. This strain demonstrated morphological characteristics of a typical Herpesvirinae by electron microscopy. Illumina and PacBio sequencing demonstrated a genome of 232,778 nt. Novel open reading frames ORFs not found in reference strain 22122 included an additional MHC Class I homolog near the right genome terminus. The CIDMTR strain was capable of dissemination in immune compromised guinea pigs, and was found to be capable of congenital transmission in GPCMV-immune dams previously infected with salivary gland-adapted strain 22122 virus. The availability of a new GPCMV strain should facilitate study of re-infection in this small animal model.

July 7, 2019

Genomic insights into the taxonomic status of the three subspecies of Bacillus subtilis.

Bacillus subtilis contains three subspecies, i.e., subspecies subtilis, spizizenii, and inaquosorum. As these subspecies are phenotypically indistinguishable, their differentiation has relied on phylogenetic analysis of multiple protein-coding gene sequences. B. subtilis subsp. inaquosorum is a recently proposed taxon that encompasses strain KCTC 13429(T) and related strains, which were previously classified as members of subspecies spizizenii. However, DNA-DNA hybridization (DDH) values among the three subspecies raised a question as to their independence. Thus, we evaluated the taxonomic status of subspecies inaquosorum using genome-based comparative analysis. In contrast to the previous experimental values of DDH, the inter-genomic relatedness inferred by average nucleotide identity (ANI) values indicated that subspecies inaquosorum and spizizenii were sufficiently different from subspecies subtilis and hence raised the possibility that the former two could be classified as separate species from B. subtilis. The genome-based tree also supported the separation of the two subspecies from B. subtilis. The exclusive presence of a subtilin synthesis system in subspecies spizizenii was a remarkable genetic characteristic that could even distinguish subspecies spizizenii from subspecies inaquosorum in addition to the low ANI values (<95%). Conclusively, the genome-based data obtained in this study demonstrated that subspecies inaquosorum and spizizenii are clearly distinguished from subspecies subtilis, and raises the possibility that these two subspecies could be classified as separate species from B. subtilis. In addition, the low ANI values between subspecies inaquosorum and spizizenii and the exclusive presence of subtilin synthesis genes in subspecies spizizenii also suggest circumscription of these two subspecies at the species level. Copyright © 2013 Elsevier GmbH. All rights reserved.

July 7, 2019

The effects of read length, quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio.

The advent of next-generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single-molecule real-time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS-enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small-scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost-efficient solutions for multispecies microsatellite projects. © 2014 John Wiley & Sons Ltd.

July 7, 2019

Draft genome sequence of Kluyveromyces marxianus strain DMB1, isolated from sugarcane bagasse hydrolysate.

We determined the genome sequence of a thermotolerant yeast, Kluyveromyces marxianus strain DMB1, isolated from sugarcane bagasse hydrolysate, and the sequence provides further insights into the genomic differences between this strain and other reported K. marxianus strains. The genome described here is composed of 11,165,408 bases and has 4,943 protein-coding genes. Copyright © 2014 Suzuki et al.

July 7, 2019

A fault-tolerant method for HLA typing with PacBio data.

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.

July 7, 2019

Complete closed genome sequences of three Bibersteinia trehalosi nasopharyngeal isolates from cattle with shipping fever.

Bibersteinia trehalosi is a respiratory pathogen affecting cattle and related ruminants worldwide. B. trehalosi is closely related to Mannheimia haemolytica and is often associated with bovine respiratory disease complex (BRDC), a polymicrobial multifactorial disease. We present three complete closed genome sequences of this species generated using an automated assembly pipeline.

July 7, 2019

Draft genome sequence of Kitasatospora cheerisanensis KCTC 2395, which produces plecomacrolide against phytopathogenic fungi.

Kitasatospora cheerisanensis KCTC 2395, which produces antifungal metabolites with bafilomycin derivatives, including bafilomycin C1-amide, was isolated from a soil sample at Mt. Jiri, South Korea. Here, we report its draft genome sequence, which contains 8.04 Mb with 73.6% G+C content and 7,810 protein-coding genes. Copyright © 2014 Hwang et al.

July 7, 2019

The oxygen-independent metabolism of cyclic monoterpenes in Castellaniella defragrans 65Phen.

The facultatively anaerobic betaproteobacterium Castellaniella defragrans 65Phen utilizes acyclic, monocyclic and bicyclic monoterpenes as sole carbon source under oxic as well as anoxic conditions. A biotransformation pathway of the acyclic ß-myrcene required linalool dehydratase-isomerase as initial enzyme acting on the hydrocarbon. An in-frame deletion mutant did not use myrcene, but was able to grow on monocyclic monoterpenes. The genome sequence and a comparative proteome analysis together with a random transposon mutagenesis were conducted to identify genes involved in the monocyclic monoterpene metabolism. Metabolites accumulating in cultures of transposon and in-frame deletion mutants disclosed the degradation pathway.Castellaniella defragrans 65Phen oxidizes the monocyclic monoterpene limonene at the primary methyl group forming perillyl alcohol. The genome of 3.95 Mb contained a 70 kb genome island coding for over 50 proteins involved in the monoterpene metabolism. This island showed higher homology to genes of another monoterpene-mineralizing betaproteobacterium, Thauera terpenica 58EuT, than to genomes of the family Alcaligenaceae, which harbors the genus Castellaniella. A collection of 72 transposon mutants unable to grow on limonene contained 17 inactivated genes, with 46 mutants located in the two genes ctmAB (cyclic terpene metabolism). CtmA and ctmB were annotated as FAD-dependent oxidoreductases and clustered together with ctmE, a 2Fe-2S ferredoxin gene, and ctmF, coding for a NADH:ferredoxin oxidoreductase. Transposon mutants of ctmA, B or E did not grow aerobically or anaerobically on limonene, but on perillyl alcohol. The next steps in the pathway are catalyzed by the geraniol dehydrogenase GeoA and the geranial dehydrogenase GeoB, yielding perillic acid. Two transposon mutants had inactivated genes of the monoterpene ring cleavage (mrc) pathway. 2-Methylcitrate synthase and 2-methylcitrate dehydratase were also essential for the monoterpene metabolism but not for growth on acetate.The genome of Castellaniella defragrans 65Phen is related to other genomes of Alcaligenaceae, but contains a genomic island with genes of the monoterpene metabolism. Castellaniella defragrans 65Phen degrades limonene via a limonene dehydrogenase and the oxidation of perillyl alcohol. The initial oxidation at the primary methyl group is independent of molecular oxygen.

July 7, 2019

Genome sequence of the e-poly-L-lysine-producing strain Streptomyces albulus NK660, isolated from soil in Gutian, Fujian Province, China.

We determined the complete genome sequence of a soil bacterium, Streptomyces albulus NK660. It can produce e-poly-l-lysine, which has antimicrobial activity against a spectrum of microorganisms. The genome of S. albulus NK660 contains a 9,360,281-bp linear chromosome and a 12,120-bp linear plasmid. Copyright © 2014 Gu et al.

July 7, 2019

Improved draft genome sequence of Clostridium pasteurianum strain ATCC 6013 (DSM 525) using a hybrid next-generation sequencing approach.

We present an improved draft genome sequence for Clostridium pasteurianum strain ATCC 6013 (DSM 525), the type strain of the species and an important solventogenic bacterium with industrial potential. Availability of a near-complete genome sequence will enable strain engineering of this promising bacterium. Copyright © 2014 Pyne et al.

July 7, 2019

Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.

The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect adaptations for surviving in this harsh environment.

Auto Tag: Bioinformatics

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

vanG element insertions within a conserved chromosomal site conferring vancomycin resistance to Streptococcus agalactiae and Streptococcus anginosus.

Site-specific genetic engineering of the Anopheles gambiae Y chromosome.

Genomic reconnaissance of clinical isolates of emerging human pathogen Mycobacterium abscessus reveals high evolutionary potential.

Molecular and biological characterization of a new isolate of guinea pig cytomegalovirus.

Genomic insights into the taxonomic status of the three subspecies of Bacillus subtilis.

The effects of read length, quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio.

Draft genome sequence of Kluyveromyces marxianus strain DMB1, isolated from sugarcane bagasse hydrolysate.

A fault-tolerant method for HLA typing with PacBio data.

Complete closed genome sequences of three Bibersteinia trehalosi nasopharyngeal isolates from cattle with shipping fever.

Draft genome sequence of Kitasatospora cheerisanensis KCTC 2395, which produces plecomacrolide against phytopathogenic fungi.

The oxygen-independent metabolism of cyclic monoterpenes in Castellaniella defragrans 65Phen.

Genome sequence of the e-poly-L-lysine-producing strain Streptomyces albulus NK660, isolated from soil in Gutian, Fujian Province, China.

Improved draft genome sequence of Clostridium pasteurianum strain ATCC 6013 (DSM 525) using a hybrid next-generation sequencing approach.

Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert