Menu
July 7, 2019

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Epigenetic mechanisms in microbial members of the human microbiota: current knowledge and perspectives.

The human microbiota and epigenetic processes have both been shown to play a crucial role in health and disease. However, there is extremely scarce information on epigenetic modulation of microbiota members except for a few pathogens. Mainly DNA adenine methylation has been described extensively in modulating the virulence of pathogenic bacteria in particular. It would thus appear likely that such mechanisms are widespread for most bacterial members of the microbiota. This review will present briefly the current knowledge on epigenetic processes in bacteria, give examples of known methylation processes in microbial members of the human microbiota and summarize the knowledge on regulation of host epigenetic processes by the human microbiota.


July 7, 2019

Genome-guided design of a defined mouse microbiota that confers colonization resistance against Salmonella enterica serovar Typhimurium.

Protection against enteric infections, also termed colonization resistance, results from mutualistic interactions of the host and its indigenous microbes. The gut microbiota of humans and mice is highly diverse and it is therefore challenging to assign specific properties to its individual members. Here, we have used a collection of murine bacterial strains and a modular design approach to create a minimal bacterial community that, once established in germ-free mice, provided colonization resistance against the human enteric pathogen Salmonella enterica serovar Typhimurium (S. Tm). Initially, a community of 12 strains, termed Oligo-Mouse-Microbiota (Oligo-MM(12)), representing members of the major bacterial phyla in the murine gut, was selected. This community was stable over consecutive mouse generations and provided colonization resistance against S. Tm infection, albeit not to the degree of a conventional complex microbiota. Comparative (meta)genome analyses identified functions represented in a conventional microbiome but absent from the Oligo-MM(12). By genome-informed design, we created an improved version of the Oligo-MM community harbouring three facultative anaerobic bacteria from the mouse intestinal bacterial collection (miBC) that provided conventional-like colonization resistance. In conclusion, we have established a highly versatile experimental system that showed efficacy in an enteric infection model. Thus, in combination with exhaustive bacterial strain collections and systems-based approaches, genome-guided design can be used to generate insights into microbe-microbe and microbe-host interactions for the investigation of ecological and disease-relevant mechanisms in the intestine.


July 7, 2019

Genomic sequencing-based mutational enrichment analysis identifies motility genes in a genetically intractable gut microbe.

A major roadblock to understanding how microbes in the gastrointestinal tract colonize and influence the physiology of their hosts is our inability to genetically manipulate new bacterial species and experimentally assess the function of their genes. We describe the application of population-based genomic sequencing after chemical mutagenesis to map bacterial genes responsible for motility in Exiguobacterium acetylicum, a representative intestinal Firmicutes bacterium that is intractable to molecular genetic manipulation. We derived strong associations between mutations in 57 E. acetylicum genes and impaired motility. Surprisingly, less than half of these genes were annotated as motility-related based on sequence homologies. We confirmed the genetic link between individual mutations and loss of motility for several of these genes by performing a large-scale analysis of spontaneous suppressor mutations. In the process, we reannotated genes belonging to a broad family of diguanylate cyclases and phosphodiesterases to highlight their specific role in motility and assigned functions to uncharacterized genes. Furthermore, we generated isogenic strains that allowed us to establish that Exiguobacterium motility is important for the colonization of its vertebrate host. These results indicate that genetic dissection of a complex trait, functional annotation of new genes, and the generation of mutant strains to define the role of genes in complex environments can be accomplished in bacteria without the development of species-specific molecular genetic tools.


July 7, 2019

D1FHS, the type strain of the ammonia-oxidizing bacterium Nitrosococcus wardiae spec. nov.: enrichment, isolation, phylogenetic, and growth physiological characterization.

An ammonia-oxidizing bacterium, strain D1FHS, was enriched into pure culture from a sediment sample retrieved in Jiaozhou Bay, a hyper-eutrophic semi-closed water body hosting the metropolitan area of Qingdao, China. Based on initial 16S rRNA gene sequence analysis, strain D1FHS was classified in the genus Nitrosococcus, family Chromatiaceae, order Chromatiales, class Gammaproteobacteria; the 16S rRNA gene sequence with highest level of identity to that of D1FHS was obtained from Nitrosococcus halophilus Nc4(T). The average nucleotide identity between the genomes of strain D1FHS and N. halophilus strain Nc4 is 89.5%. Known species in the genus Nitrosococcus are obligate aerobic chemolithotrophic ammonia-oxidizing bacteria adapted to and restricted to marine environments. The optimum growth (maximum nitrite production) conditions for D1FHS in a minimal salts medium are: 50 mM ammonium and 700 mM NaCl at pH of 7.5 to 8.0 and at 37°C in dark. Because pertinent conditions for other studied Nitrosococcus spp. are 100-200 mM ammonium and <700 mM NaCl at pH of 7.5 to 8.0 and at 28-32°C, D1FHS is physiologically distinct from other Nitrosococcus spp. in terms of substrate, salt, and thermal tolerance.


July 7, 2019

Genomic and transcriptomic analyses reveal the characterization of a crude oil degrading bacterial strain: Pedobacter steynii DX4

Pedobacter steynii DX4, isolated from Qinghai-Tibet plateau, exhibited capability to effectively degrade crude oil at low temperature. In order to illustrate its biodegradation mechanism, whole genome and transcriptome sequencing were performed. It is the first genome of crude oil degrading strain in Pedobacter genus. The P. steynii DX4 genome consists of a single circular chromosome of 6,581,659 bp with an average G+C content of 41.31% and encodes 5464 genes in all. GIs were predicted and comparison analysis was performed between relative species. Genome annotation predicted several hydrocarbon oxygenases, chemotaxis proteins and biosurfactant synthetases. The transcriptional sequences profiled a lot of differently expressed genes when cells respectively grown on crude oil and pyruvate mediums. Crude oil significantly stimulated the expression of the genes related to the hydrocarbon oxidation and resparitory chain. Genomic and transcriptomic analysis of P. steynii DX4 have revealed the machenism of the crude oil degradation in Pedobacter steynii DX4 and provided us with valuable knowledge base to make effective strategy to mitigate the ecological damage caused by crude oil pollution.


July 7, 2019

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.© The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

Microbial metagenomics mock scenario-based sample simulation (M3S3).

Shotgun sequencing in increasingly applied in clinical microbiology for unbiased culture-independent diagnosis. While software solutions for metagenomics proliferate, integration of metagenomics in clinical care, requires method standardisation and validation. Virtual metagenomics samples could underpin validation by substituting real samples and thus we sought to develop a novel solution for simulation of metagenomics samples based on user-defined clinical scenarios.We designed the Microbial Metagenomics Mock Scenario-based Sample Simulation (M3S3) workflow, which allows users to generate virtual samples from raw reads or assemblies. The M3S3 output is a mock sample in FASTQ or FASTA format. M3S3 was tested by generating virtual samples for ten challenging infectious disease scenarios, involving a background matrix ‘spiked’ in silico with pathogens including mixtures. Replicate samples (seven per scenario) were used to represent different compositional ratios. Virtual samples were analysed using Taxonomer and Kraken db.The ten challenge scenarios were successfully applied, generating 80 samples. For all tested scenarios, the virtual samples showed sequence compositions as predicted from the user input. Spiked pathogen sequences were identified with the majority of the replicates and most exhibited acceptable abundance (deviation between expected and observed abundance of spiked pathogens), with slight differences observed between software tools.Despite demonstrated proof-of-concept, integration of clinical metagenomics in routine microbiology remains a substantial challenge. M3S3 is capable of producing virtual samples on-demand, simulating a spectrum of clinical diagnostic scenarios of varying complexity. The M3S3 tool can therefore support the development and validation of standardised metagenomics applications. Copyright © 2017. Published by Elsevier Ltd.


July 7, 2019

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies.Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S.pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies.The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo.baoe@bjtu.edu.cn.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


July 7, 2019

Microbial sequence typing in the genomic era.

Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.


July 7, 2019

Ten steps to get started in Genome Assembly and Annotation.

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).


July 7, 2019

Complete genome sequence of Granulosicoccus antarcticus type strain IMCC3135T, a marine gammaproteobacterium with a putative dimethylsulfoniopropionate demethylase gene

Granulosicoccus, the only genus of the family Granulosicoccaceae, occupies a distinct phylogenetic position within the order Chromatiales of the Gammaproteobacteria. The genus has been found in various marine regions, especially associated with diverse marine macroalgae. No genomes have been reported for the genus Granulosicoccus thus far, hampering studies on physiology and lifestyles of this genus. Here we report the complete genome sequence of strain IMCC3135T, the type strain of Granulosicoccus antarcticus isolated from Antarctic coastal seawater. The genome was 7.78Mbp long and harbored many genes involved in sulfur metabolism. In particular, a gene for dimethylsulfoniopropionate (DMSP) demethylase was found in the genome, rendering strain IMCC3135T one of the few marine gammaproteobacteria equipped with the potential for DMSP demethylation.


July 7, 2019

A fast approximate algorithm for mapping long reads to large reference databases.

Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290?×?faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each =5?kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.


July 7, 2019

Rhodobacter sp. Rb3, an aerobic anoxygenic phototroph which thrives in the polyextreme ecosystem of the Salar de Huasco, in the Chilean Altiplano.

The Salar de Huasco is an evaporitic basin located in the Chilean Altiplano, which presents extreme environmental conditions for life, i.e. high altitude (3800 m.a.s.l.), negative water balance, a wide salinity range, high daily temperature changes and the occurrence of the highest registered solar radiation on the planet (>?1200 W m-2). This ecosystem is considered as a natural laboratory to understand different adaptations of microorganisms to extreme conditions. Rhodobacter, an anoxygenic aerobic phototrophic bacterial genus, represents one of the most abundant groups reported based on taxonomic diversity surveys in this ecosystem. The bacterial mat isolate Rhodobacter sp. strain Rb3 was used to study adaptation mechanisms to stress-inducing factors potentially explaining its success in a polyextreme ecosystem. We found that the Rhodobacter sp. Rb3 genome was characterized by a high abundance of genes involved in stress tolerance and adaptation strategies, among which DNA repair and oxidative stress were the most conspicuous. Moreover, many other molecular mechanisms associated with oxidative stress, photooxidation and antioxidants; DNA repair and protection; motility, chemotaxis and biofilm synthesis; osmotic stress, metal, metalloid and toxic anions resistance; antimicrobial resistance and multidrug pumps; sporulation; cold shock and heat shock stress; mobile genetic elements and toxin-antitoxin system were detected and identified as potential survival mechanism features in Rhodobacter sp. Rb3. In total, these results reveal a wide set of strategies used by the isolate to adapt and thrive under environmental stress conditions as a model of polyextreme environmental resistome.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.