Long-read sequencing is emerging as a promising sequencing technology because it can tackle the short length limitation of second-generation sequencing, which has dominated the sequencing market in past years. However, it has substantially higher error rates compared to short-read sequencing (e.g., 13% vs. 0.1%), and its sequencing cost per base is typically more expensive than that of short-read sequencing. To address these limitations, we present a distributed hybrid error correction framework, called ParLECH, that is scalable and cost-efficient for PacBio long reads. For correcting the errors in the long reads, ParLECH utilizes the Illumina short reads that have the low error rate with high coverage at low cost. To efficiently analyze the high-throughput Illumina short reads, ParLECH is equipped with Hadoop and a distributed NoSQL system. To further improve the accuracy, ParLECH utilizes the k-mer coverage information of the Illumina short reads. Specifically, we develop a distributed version of the widest path algorithm, which maximizes the minimum k-mer coverage in a path of the de Bruijn graph constructed from the Illumina short reads. We replace an error region in a long read with its corresponding widest path. Our experimental results show that ParLECH can handle large-scale real-world datasets in a scalable and accurate manner. Using ParLECH, we can process a 312 GB human genome PacBio dataset, with a 452 GB Illumina dataset, on 128 nodes in less than 29 hours.
Development of a Novel Reference Transcriptome for Scleractinian Coral Porites lutea Using Single-Molecule Long-Read Isoform Sequencing (Iso-Seq)
Elevation in seawater temperature associated with global climate change has caused coral bleaching problems and posed a significant threat to coral health and survival worldwide. Several studies have explored the effects of thermal stress on changes in gene expression levels of both coral hosts and their algal endosymbionts and provided evidences suggesting that corals could acclimatize to environmental stressors through differential regulation of their gene expression (Desalvo et al., 2008, 2010; Császár et al., 2009; Rodriguez-Lanetty et al., 2009; Polato et al., 2010; Meyer et al., 2011; Kenkel et al., 2013). Such information is crucial for understanding the adaptive capacity of the coral holobionts (Hughes et al., 2003). The availability of transcriptome data from a number of coral species and their associated Symbiodinium allows us to probe the molecular stress response of the organisms to heat stress (Traylor-Knowles et al., 2011; Moya et al., 2012; Kenkel et al., 2013; Shinzato et al., 2014; Kitchen et al., 2015; Anderson et al., 2016; Davies et al., 2016). Here, we report the first reference transcriptome for a scleractinian coral Porites lutea, one of the dominant reef-builders in the Indo-West Pacific (Yeemin et al., 2009). We applied both short-read Ion S5 RNA sequencing and long-read Pacific Biosciences (PacBio) isoform sequencing (Iso-seq) to generate transcriptome sequences of P. lutea under normal and heat stress conditions. The key advantage of PacBio’s Iso-seq technology lies within its ability to capture full-length mRNA sequences. These full-length transcripts enable the identification of novel genes/isoforms and the detection of alternative splice variants, which have been shown to be overrepresented in stress responses (Iida et al., 2004; Reddy et al., 2013; Liu and Guo, 2017). We envision that this reference transcriptome will provide a coral research community a valuable resource for investigating changes in gene expression under various biotic/abiotic stress conditions.
Coral-associated microorganisms play an important role in their host fitness and survival. A number of studies have demonstrated connections between thermal tolerance in corals and the type/relative abundance of Symbiodinium they harbor. More recently, the shifts in coral-associated bacterial profiles were also shown to be linked to the patterns of coral heat tolerance. Here, we investigated the dynamics of Porites lutea-associated bacterial and algal communities throughout a natural bleaching event, using full-length 16S rRNA and internal transcribed spacer sequences (ITS) obtained from PacBio circular consensus sequencing. We provided evidence of significant changes in the structure and diversity of coral-associated microbiomes during thermal stress. The balance of the symbiosis shifted from a predominant association between corals and Gammaproteobacteria to a predominance of Alphaproteobacteria and to a lesser extent Betaproteobacteria following the bleaching event. On the contrary, the composition and diversity of Symbiodinium communities remained unaltered throughout the bleaching event. It appears that the switching and/or shuffling of Symbiodinium types may not be the primary mechanism used by P. lutea to cope with increasing seawater temperature. The shifts in the structure and diversity of associated bacterial communities may contribute more to the survival of the coral holobiont under heat stress.© 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
High resolution profiling of coral-associated bacterial communities using full-length 16S rRNA sequence data from PacBio SMRT sequencing system.
Coral reefs are a complex ecosystem consisting of coral animals and a vast array of associated symbionts including the dinoflagellate Symbiodinium, fungi, viruses and bacteria. Several studies have highlighted the importance of coral-associated bacteria and their fundamental roles in fitness and survival of the host animal. The scleractinian coral Porites lutea is one of the dominant reef-builders in the Indo-West Pacific. Currently, very little is known about the composition and structure of bacterial communities across P. lutea reefs. The purpose of this study is twofold: to demonstrate the advantages of using PacBio circular consensus sequencing technology in microbial community studies and to investigate the diversity and structure of P. lutea-associated microbiome in the Indo-Pacific. This is the first metagenomic study of marine environmental samples that utilises the PacBio sequencing system to capture full-length 16S rRNA sequences. We observed geographically distinct coral-associated microbial profiles between samples from the Gulf of Thailand and Andaman Sea. Despite the geographical and environmental impacts on the coral-host interactions, we identified a conserved community of bacteria that were present consistently across diverse reef habitats. Finally, we demonstrated the superior performance of full-length 16S rRNA sequences in resolving taxonomic uncertainty of coral associates at the species level.
Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. Copyright © 2018 Elsevier Ltd. All rights reserved.
A comprehensive fungi-specific 18S rRNA gene sequence primer toolkit suited for diverse research issues and sequencing platforms.
Several fungi-specific primers target the 18S rRNA gene sequence, one of the prominent markers for fungal classification. The design of most primers goes back to the last decades. Since then, the number of sequences in public databases increased leading to the discovery of new fungal groups and changes in fungal taxonomy. However, no reevaluation of primers was carried out and relevant information on most primers is missing. With this study, we aimed to develop an 18S rRNA gene sequence primer toolkit allowing an easy selection of the best primer pair appropriate for different sequencing platforms, research aims (biodiversity assessment versus isolate classification) and target groups.We performed an intensive literature research, reshuffled existing primers into new pairs, designed new Illumina-primers, and annealing blocking oligonucleotides. A final number of 439 primer pairs were subjected to in silico PCRs. Best primer pairs were selected and experimentally tested. The most promising primer pair with a small amplicon size, nu-SSU-1333-5’/nu-SSU-1647-3′ (FF390/FR-1), was successful in describing fungal communities by Illumina sequencing. Results were confirmed by a simultaneous metagenomics and eukaryote-specific primer approach. Co-amplification occurred in all sample types but was effectively reduced by blocking oligonucleotides.The compiled data revealed the presence of an enormous diversity of fungal 18S rRNA gene primer pairs in terms of fungal coverage, phylum spectrum and co-amplification. Therefore, the primer pair has to be carefully selected to fulfill the requirements of the individual research projects. The presented primer toolkit offers comprehensive lists of 164 primers, 439 primer combinations, 4 blocking oligonucleotides, and top primer pairs holding all relevant information including primer’s characteristics and performance to facilitate primer pair selection.
Effects of metal and metalloid pollutants on the microbiota composition of feces obtained from twelve commercial pig farms across China.
Understanding the metal and metalloid contamination and microbiota composition of pig feces is an important step required to support the design and implementation of effective pollution control and prevention strategies. A survey was implemented in 12 locations across China to investigate the content of metals and metalloids, and the main composition of the microbial communities of commercially reared pigs during two growth periods, defined as the early (Q group) and the later fattening growth phases (H group). These data showed widespread Al, Mn, Cu, Zn, and Fe pollution in pig feces. The concentration of Zn in the Q group feces was nearly two times higher than the levels measured in the H group. The microbial composition of the Q group exhibited greater richness of operational taxonomic units (OTUs) and fewer bacteria associated with zoonotic diseases compared with the microbial composition of the H group. Spearman rank correlation analysis showed that Cu and northern latitudes had a significant positive effect on the richness of bacterial communities in pig feces. Zn and Cd exhibited the biggest impact on microbial community composition based on canonical correspondence analysis. Functional metagenomic prediction indicated that about 0.8% genes present in the pig feces bacteria community are related to human diseases, and significantly more predicted pathogenic genes were detected in the H group than in the Q group. These results support the need to monitor heavy metal contamination and to control for zoonotic pathogens disseminated from pig feces in Chinese pig farms. Copyright © 2018. Published by Elsevier B.V.
Evolution of novel traits is a challenging subject in biological research. Several snake lineages developed elaborate venom systems to deliver complex protein mixtures for prey capture. To understand mechanisms involved in snake venom evolution, we decoded here the ~1.4-Gb genome of a habu, Protobothrops flavoviridis. We identified 60 snake venom protein genes (SV) and 224 non-venom paralogs (NV), belonging to 18 gene families. Molecular phylogeny reveals early divergence of SV and NV genes, suggesting that one of the four copies generated through two rounds of whole-genome duplication was modified for use as a toxin. Among them, both SV and NV genes in four major components were extensively duplicated after their diversification, but accelerated evolution is evident exclusively in the SV genes. Both venom-related SV and NV genes are significantly enriched in microchromosomes. The present study thus provides a genetic background for evolution of snake venom composition.
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.
Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome.
Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested.Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes.MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.
Root endophytes and invasiveness: no difference between native and non-native Phragmites in the Great Lakes Region
Microbial interactions could play an important role in plant invasions. If invasive plants associate with relatively more mutualists or fewer pathogens than their native counterparts, then microbial communities could foster plant invasiveness. Studies examining the effects of microbes on invasive plants commonly focus on a single microbial group (e.g., bacteria) or measure only plant response to microbes, not documenting the specific taxa associating with invaders. We surveyed root microbial communities associated with co-occurring native and non-native lineages of Phragmites australis, across Michigan, USA. Our aim was to determine whether (1) plant lineage was a stronger predictor of root microbial community composition than environmental variables and (2) the non-native lineage associated with more mutualistic and/or fewer pathogenic microbes than the native lineage. We used microscopy and culture-independent molecular methods to examine fungal colonization rate and community composition in three major microbial groups (bacteria, fungi, and oomycetes) within roots. We also used microbial functional databases to assess putative functions of the observed microbial taxa. While fungal colonization of roots was significantly higher in non-native Phragmites than the native lineage, we found no differences in root microbial community composition or potential function between the two Phragmites lineages. Community composition did differ significantly by site, with soil saturation playing a significant role in structuring communities in all three microbial groups. The relative abundance of some specific bacterial taxa did differ between Phragmites lineages at the phylum and genus level (e.g., Proteobacteria, Firmicutes). Purported function of root fungi and respiratory mode of root bacteria also did not differ between native and non-native Phragmites. We found no evidence that native and non-native Phragmites harbored distinct root microbial communities; nor did those communities differ functionally. Therefore, if the trends revealed at our sites are widespread, it is unlikely that total root microbial communities are driving invasion by non-native Phragmites plants.
Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization.
Marine sponges are ancient metazoans that are populated by distinct and highly diverse microbial communities. In order to obtain deeper insights into the functional gene repertoire of the Mediterranean sponge Aplysina aerophoba, we combined Illumina short-read and PacBio long-read sequencing followed by un-targeted metagenomic binning. We identified a total of 37 high-quality bins representing 11 bacterial phyla and two candidate phyla. Statistical comparison of symbiont genomes with selected reference genomes revealed a significant enrichment of genes related to bacterial defense (restriction-modification systems, toxin-antitoxin systems) as well as genes involved in host colonization and extracellular matrix utilization in sponge symbionts. A within-symbionts genome comparison revealed a nutritional specialization of at least two symbiont guilds, where one appears to metabolize carnitine and the other sulfated polysaccharides, both of which are abundant molecules in the sponge extracellular matrix. A third guild of symbionts may be viewed as nutritional generalists that perform largely the same metabolic pathways but lack such extraordinary numbers of the relevant genes. This study characterizes the genomic repertoire of sponge symbionts at an unprecedented resolution and it provides greater insights into the molecular mechanisms underlying microbial-sponge symbiosis.
Halobacteriovorax strains are saltwater-adapted predatory bacteria that attack Gram-negative bacteria and may play an important role in shaping microbial communities. To understand how Halobacteriovorax strains impact ecosystems and develop them as biocontrol agents, it is important to characterize variation in predation phenotypes and investigate Halobacteriovorax genome evolution. We isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island using Vibrio from the same site as prey. Small, fast-moving, attack-phase BE01 cells attach to and invade prey cells, consistent with the intraperiplasmic predation strategy of the H. marinus type strain, SJ. BE01 is a prey generalist, forming plaques on Vibrio strains from the estuary, Pseudomonas from soil, and Escherichia coli. Genome analysis revealed extremely high conservation of gene order and amino acid sequences between BE01 and SJ, suggesting strong selective pressure to maintain the genome in this H. marinus lineage. Despite this, we identified two regions of gene content difference that likely resulted from horizontal gene transfer. Analysis of modal codon usage frequencies supports the hypothesis that these regions were acquired from bacteria with different codon usage biases than H. marinus. In one of these regions, BE01 and SJ carry different genes associated with mobile genetic elements. Acquired functions in BE01 include the dnd operon, which encodes a pathway for DNA modification, and a suite of genes involved in membrane synthesis and regulation of gene expression that was likely acquired from another Halobacteriovorax lineage. This analysis provides further evidence that horizontal gene transfer plays an important role in genome evolution in predatory bacteria. IMPORTANCE Predatory bacteria attack and digest other bacteria and therefore may play a role in shaping microbial communities. To investigate phenotypic and genotypic variation in saltwater-adapted predatory bacteria, we isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island, assayed whether it could attack different prey bacteria, and sequenced and analyzed its genome. We found that BE01 is a prey generalist, attacking bacteria from different phylogenetic groups and environments. Gene order and amino acid sequences are highly conserved between BE01 and the H. marinus type strain, SJ. By comparative genomics, we detected two regions of gene content difference that likely occurred via horizontal gene transfer events. Acquired genes encode functions such as modification of DNA, membrane synthesis and regulation of gene expression. Understanding genome evolution and variation in predation phenotypes among predatory bacteria will inform their development as biocontrol agents and clarify how they impact microbial communities.
Clownfishes (or anemonefishes) form an iconic group of coral reef fishes, principally known for their mutualistic interaction with sea anemones. They are characterized by particular life history traits, such as a complex social structure and mating system involving sequential hermaphroditism, coupled with an exceptionally long lifespan. Additionally, clownfishes are considered to be one of the rare groups to have experienced an adaptive radiation in the marine environment. Here, we assembled and annotated the first genome of a clownfish species, the tomato clownfish (Amphiprion frenatus). We obtained 17,801 assembled scaffolds, containing a total of 26,917 genes. The completeness of the assembly and annotation was satisfying, with 96.5% of the Actinopterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs) being retrieved in A. frenatus assembly. The quality of the resulting assembly is comparable to other bony fish assemblies. This resource is valuable for advancing studies of the particular life history traits of clownfishes, as well as being useful for population genetic studies and the development of new phylogenetic markers. It will also open the way to comparative genomics. Indeed, future genomic comparison among closely related fishes may provide means to identify genes related to the unique adaptations to different sea anemone hosts, as well as better characterize the genomic signatures of an adaptive radiation.© 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Comparative genomics and transcriptomics analysis-guided metabolic engineering of Propionibacterium acidipropionici for improved propionic acid production.
Acid stress induced by the accumulation of organic acids during the fermentation of propionibacteria is a severe limitation in the microbial production of propionic acid (PA). To enhance the acid resistance of strains, the tolerance mechanisms of cells must first be understood. In this study, comparative genomic and transcriptomic analyses were conducted on wild-type and acid-tolerant Propionibacterium acidipropionici to reveal the microbial response of cells to acid stress during fermentation. Combined with the results of previous proteomic and metabolomic studies, several potential acid-resistance mechanisms of P. acidipropionici were analyzed. Energy metabolism and transporter activity of cells were regulated to maintain pH homeostasis by balancing transmembrane transport of protons and ions; redundant protons were eliminated by enhancing the metabolism of certain amino acids for a relatively stable intracellular microenvironment; and protective mechanism of macromolecules were also induced to repair damage to proteins and DNA by acids. Transcriptomic data indicated that the synthesis of acetate and lactate were undesirable in the acid-resistant mutant, the expression of which was 2.21-fold downregulated. In addition, metabolomic data suggested that the accumulation of lactic acid and acetic acid reduced the carbon flow to PA and led to a decrease in pH. On this basis, we propose a metabolic engineering strategy to regulate the synthesis of lactic acid and acetic acid that will reduce by-products significantly and increase the PA yield by 12.2% to 10.31?±?0.84?g/g DCW. Results of this study provide valuable guidance to understand the response of bacteria to acid stress and to construct microbial cell factories to produce organic acids by combining systems biology technologies with synthetic biology tools.© 2017 Wiley Periodicals, Inc.