PacBio customers discuss their applications of PacBio SMRT Sequencing and long reads, including Lemuel Racacho (Children’s Hospital of Eastern Ontario Research Institute), Matthew Blow (JGI), Yuta Suzuki (U. of Tokyo),…
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut microbiome data sets. Comparison with the human microbiome revealed rumen-specific enrichment for genes encoding de novo synthesis of vitamin B12, ongoing evolution by gene loss and potential vertical inheritance of the rumen microbiome based on underrepresentation of markers of environmental stress. We estimate that our Hungate genome resource represents ~75% of the genus-level bacterial and archaeal taxa present in the rumen.
Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.
PCR and omics based techniques to study the diversity, ecology and biology of anaerobic fungi: Insights, challenges andopportunities.
Anaerobic fungi (phylum Neocallimastigomycota) are common inhabitants of the digestive tract of mammalian herbivores, and in the rumen, can account for up to 20% of the microbial biomass. Anaerobic fungi play a primary role in the degradation of lignocellulosic plant material. They also have a syntrophic interaction with methanogenic archaea, which increases their fiber degradation activity. To date, nine anaerobic fungal genera have been described, with further novel taxonomic groupings known to exist based on culture-independent molecular surveys. However, the true extent of their diversity may be even more extensively underestimated as anaerobic fungi continue being discovered in yet unexplored gut and non-gut environments. Additionally many studies are now known to have used primers that provide incomplete coverage of the Neocallimastigomycota. For ecological studies the internal transcribed spacer 1 region (ITS1) has been the taxonomic marker of choice, but due to various limitations the large subunit rRNA (LSU) is now being increasingly used. How the continued expansion of our knowledge regarding anaerobic fungal diversity will impact on our understanding of their biology and ecological role remains unclear; particularly as it is becoming apparent that anaerobic fungi display niche differentiation. As a consequence, there is a need to move beyond the broad generalization of anaerobic fungi as fiber-degraders, and explore the fundamental differences that underpin their ability to exist in distinct ecological niches. Application of genomics, transcriptomics, proteomics and metabolomics to their study in pure/mixed cultures and environmental samples will be invaluable in this process. To date the genomes and transcriptomes of several characterized anaerobic fungal isolates have been successfully generated. In contrast, the application of proteomics and metabolomics to anaerobic fungal analysis is still in its infancy. A central problem for all analyses, however, is the limited functional annotation of anaerobic fungal sequence data. There is therefore an urgent need to expand information held within publicly available reference databases. Once this challenge is overcome, along with improved sample collection and extraction, the application of these techniques will be key in furthering our understanding of the ecological role and impact of anaerobic fungi in the wide range of environments they inhabit.
Microbial toluene biosynthesis was reported in anoxic lake sediments more than three decades ago, but the enzyme catalyzing this biochemically challenging reaction has never been identified. Here we report the toluene-producing enzyme PhdB, a glycyl radical enzyme of bacterial origin that catalyzes phenylacetate decarboxylation, and its cognate activating enzyme PhdA, a radical S-adenosylmethionine enzyme, discovered in two distinct anoxic microbial communities that produce toluene. The unconventional process of enzyme discovery from a complex microbial community (>300,000 genes), rather than from a microbial isolate, involved metagenomics- and metaproteomics-enabled biochemistry, as well as in vitro confirmation of activity with recombinant enzymes. This work expands the known catalytic range of glycyl radical enzymes (only seven reaction types had been characterized previously) and aromatic-hydrocarbon-producing enzymes, and will enable first-time biochemical synthesis of an aromatic fuel hydrocarbon from renewable resources, such as lignocellulosic biomass, rather than from petroleum.
Switchgrass (Panicum virgatum L.) is an important bioenergy crop widely used for lignocellulosic research. While extensive transcriptomic analyses have been conducted on this species using short read-based sequencing techniques, very little has been reliably derived regarding alternatively spliced (AS) transcripts.We present an analysis of transcriptomes of six switchgrass tissue types pooled together, sequenced using Pacific Biosciences (PacBio) single-molecular long-read technology. Our analysis identified 105,419 unique transcripts covering 43,570 known genes and 8795 previously unknown genes. 45,168 are novel transcripts of known genes. A total of 60,096 AS transcripts are identified, 45,628 being novel. We have also predicted 1549 transcripts of genes involved in cell wall construction and remodeling, 639 being novel transcripts of known cell wall genes. Most of the predicted transcripts are validated against Illumina-based short reads. Specifically, 96% of the splice junction sites in all the unique transcripts are validated by at least five Illumina reads. Comparisons between genes derived from our identified transcripts and the current genome annotation revealed that among the gene set predicted by both analyses, 16,640 have different exon-intron structures.Overall, substantial amount of new information is derived from the PacBio RNA data regarding both the transcriptome and the genome of switchgrass.
Conserved genomic and amino acid traits of cold adaptation in subzero-growing Arctic permafrost bacteria.
Permafrost accounts for 27% of all soil ecosystems and harbors diverse microbial communities. Our understanding of microorganisms in permafrost, their activities and adaptations, remains limited. Using five subzero-growing (cryophilic) permafrost bacteria, we examined features of cold adaptation through comparative genomic analyses with mesophilic relatives. The cryophiles possess genes associated with cold adaptation, including cold shock proteins, RNA helicases, and oxidative stress and carotenoid synthesis enzymes. Higher abundances of genes associated with compatible solutes were observed, important for osmoregulation in permafrost brine veins. Most cryophiles in our study have higher transposase copy numbers than mesophiles. We investigated amino acid (AA) modifications in the cryophiles favoring increased protein flexibility at cold temperatures. Although overall there were few differences with the mesophiles, we found evidence of cold adaptation, with significant differences in proline, serine, glycine and aromaticity, in several cryophiles. The use of cold/hot AA ratios of >1, used in previous studies to indicate cold adaptation, was found to be inadequate on its own. Comparing the average of all cryophiles to all mesophiles, we found that overall cryophiles had a higher ratio of cold adapted proteins for serine (more serine), and to a lesser extent, proline and acidic residues (fewer prolines/acidic residues).
N6-methyldeoxyadenine (6mA) is a noncanonical DNA base modification present at low levels in plant and animal genomes, but its prevalence and association with genome function in other eukaryotic lineages remains poorly understood. Here we report that abundant 6mA is associated with transcriptionally active genes in early-diverging fungal lineages. Using single-molecule long-read sequencing of 16 diverse fungal genomes, we observed that up to 2.8% of all adenines were methylated in early-diverging fungi, far exceeding levels observed in other eukaryotes and more derived fungi. 6mA occurred symmetrically at ApT dinucleotides and was concentrated in dense methylated adenine clusters surrounding the transcriptional start sites of expressed genes; its distribution was inversely correlated with that of 5-methylcytosine. Our results show a striking contrast in the genomic distributions of 6mA and 5-methylcytosine and reinforce a distinct role for 6mA as a gene-expression-associated epigenomic mark in eukaryotes.
Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species.
The fungal genus ofAspergillusis highly interesting, containing everything from industrial cell factories, model organisms, and human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverseAspergillusspecies (A. campestris,A. novofumigatus,A. ochraceoroseus, andA. steynii) have been whole-genome PacBio sequenced to provide genetic references in threeAspergillussections.A. taichungensisandA. candidusalso were sequenced for SM elucidation. ThirteenAspergillusgenomes were analyzed with comparative genomics to determine phylogeny and genetic diversity, showing that each presented genome contains 15-27% genes not found in other sequenced Aspergilli. In particular,A. novofumigatuswas compared with the pathogenic speciesA. fumigatusThis suggests thatA. novofumigatuscan produce most of the same allergens, virulence, and pathogenicity factors asA. fumigatus, suggesting thatA. novofumigatuscould be as pathogenic asA. fumigatusFurthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences, and predictive algorithms. We thus identify putative SM clusters for aflatoxin, chlorflavonin, and ochrindol inA. ochraceoroseus,A. campestris, andA. steynii, respectively, and novofumigatonin,ent-cycloechinulin, andepi-aszonalenins inA. novofumigatusOur study delivers six fungal genomes, showing the large diversity found in theAspergillusgenus; highlights the potential for discovery of beneficial or harmful SMs; and supports reports ofA. novofumigatuspathogenicity. It also shows how biological, biochemical, and genomic information can be combined to identify genes involved in the biosynthesis of specific SMs.
A novel type pathway-specific regulator and dynamic genome environments of solanapyrone biosynthesis gene cluster in the fungus Ascochyta rabiei.
Secondary metabolite genes are often clustered together and situated in particular genomic regions, like the subtelomere, that can facilitate niche adaptation in fungi. Solanapyrones are toxic secondary metabolites produced by fungi occupying different ecological niches. Full-genome sequencing of the ascomycete Ascochyta rabiei revealed a solanapyrone biosynthesis gene cluster embedded in an AT-rich region proximal to a telomere end and surrounded by Tc1/Mariner-type transposable elements. The highly AT-rich environment of the solanapyrone cluster is likely the product of repeat-induced point mutations. Several secondary metabolism-related genes were found in the flanking regions of the solanapyrone cluster. Although the solanapyrone cluster appears to be resistant to repeat-induced point mutations, a P450 monooxygenase gene adjacent to the cluster has been degraded by such mutations. Among the six solanapyrone cluster genes (sol1 to sol6), sol4 encodes a novel type of Zn(II)2Cys6 zinc cluster transcription factor. Deletion of sol4 resulted in the complete loss of solanapyrone production but did not compromise growth, sporulation, or virulence. Gene expression studies with the sol4 deletion and sol4-overexpressing mutants delimited the boundaries of the solanapyrone gene cluster and revealed that sol4 is likely a specific regulator of solanapyrone biosynthesis and appears to be necessary and sufficient for induction of the solanapyrone cluster genes. Despite the dynamic surrounding genomic regions, the solanapyrone gene cluster has maintained its integrity, suggesting important roles of solanapyrones in fungal biology. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Current overview on the study of bacteria in the rhizosphere by modern molecular techniques: a mini–review
The rhizosphere (soil zone influenced by roots) is a complex environment that harbors diverse bacterial populations, which have an important role in biogeochemical cycling of organic matter and mineral nutrients. Nevertheless, our knowledge of the ecology and role of these bacteria in the rhizosphere is very limited, particularly regarding how indigenous bacteria are able to communicate, colonize root environments, and compete along the rhizosphere microsites. In recent decades, the development and improvement of molecular techniques have provided more accurate knowledge of bacteria in their natural environment, refining microbial ecology and generating new questions about the roles and functions of bacteria in the rhizosphere. Recently, advances in soil post?genomic techniques (metagenomics, metaproteomics and metatranscriptomics) are being applied to improve our understanding of the microbial communities at a higher resolution. Moreover, advantages and limitations of classical and post?genomic techniques must be considered when studying bacteria in the rhizosphere. This review provides an overview of the current knowledge on the study of bacterial community in the rhizosphere by using modern molecular techniques, describing the bias of classical molecular techniques, next generation sequencing platforms and post?genomics techniques.
High-quality draft genome sequences of four lignocellulose-degrading bacteria isolated from Puerto Rican forest soil: Gordonia sp., Paenibacillus sp., Variovorax sp., and Vogesella sp.
Here, we report the high-quality draft genome sequences of four phylogenetically diverse lignocellulose-degrading bacteria isolated from tropical soil (Gordonia sp., Paenibacillus sp., Variovorax sp., and Vogesella sp.) to elucidate the genetic basis of their ability to degrade lignocellulose. These isolates may provide novel enzymes for biofuel production. Copyright © 2017 Woo et al.
Characterization of four endophytic fungi as potential consolidated bioprocessing hosts for conversion of lignocellulose into advanced biofuels.
Recently, several endophytic fungi have been demonstrated to produce volatile organic compounds (VOCs) with properties similar to fossil fuels, called “mycodiesel,” while growing on lignocellulosic plant and agricultural residues. The fact that endophytes are plant symbionts suggests that some may be able to produce lignocellulolytic enzymes, making them capable of both deconstructing lignocellulose and converting it into mycodiesel, two properties that indicate that these strains may be useful consolidated bioprocessing (CBP) hosts for the biofuel production. In this study, four endophytes Hypoxylon sp. CI4A, Hypoxylon sp. EC38, Hypoxylon sp. CO27, and Daldinia eschscholzii EC12 were selected and evaluated for their CBP potential. Analysis of their genomes indicates that these endophytes have a rich reservoir of biomass-deconstructing carbohydrate-active enzymes (CAZys), which includes enzymes active on both polysaccharides and lignin, as well as terpene synthases (TPSs), enzymes that may produce fuel-like molecules, suggesting that they do indeed have CBP potential. GC-MS analyses of their VOCs when grown on four representative lignocellulosic feedstocks revealed that these endophytes produce a wide spectrum of hydrocarbons, the majority of which are monoterpenes and sesquiterpenes, including some known biofuel candidates. Analysis of their cellulase activity when grown under the same conditions revealed that these endophytes actively produce endoglucanases, exoglucanases, and ß-glucosidases. The richness of CAZymes as well as terpene synthases identified in these four endophytic fungi suggests that they are great candidates to pursue for development into platform CBP organisms.
Genome stability in engineered strains of the extremely thermophilic lignocellulose-degrading bacterium Caldicellulosiruptor bescii.
Caldicellulosiruptor bescii is the most thermophilic cellulose degrader known and is of great interest because of its ability to degrade nonpretreated plant biomass. For biotechnological applications, an efficient genetic system is required to engineer it to convert plant biomass into desired products. To date, two different genetically tractable lineages of C. bescii strains have been generated. The first (JWCB005) is based on a random deletion within the pyrimidine biosynthesis genes pyrFA, and the second (MACB1018) is based on the targeted deletion of pyrE, making use of a kanamycin resistance marker. Importantly, an active insertion element, ISCbe4, was discovered in C. bescii when it disrupted the gene for lactate dehydrogenase (ldh) in strain JWCB018, constructed in the JWCB005 background. Additional instances of ISCbe4 movement in other strains of this lineage are presented herein. These observations raise concerns about the genetic stability of such strains and their use as metabolic engineering platforms. In order to investigate genome stability in engineered strains of C. bescii from the two lineages, genome sequencing and Southern blot analyses were performed. The evidence presented shows a dramatic increase in the number of single nucleotide polymorphisms, insertions/deletions, and ISCbe4 elements within the genome of JWCB005, leading to massive genome rearrangements in its daughter strain, JWCB018. Such dramatic effects were not evident in the newer MACB1018 lineage, indicating that JWCB005 and its daughter strains are not suitable for metabolic engineering purposes in C. bescii Furthermore, a facile approach for assessing genomic stability in C. bescii has been established. IMPORTANCE Caldicellulosiruptor bescii is a cellulolytic extremely thermophilic bacterium of great interest for metabolic engineering efforts geared toward lignocellulosic biofuel and bio-based chemical production. Genetic technology in C. bescii has led to the development of two uracil auxotrophic genetic background strains for metabolic engineering. We show that strains derived from the genetic background containing a random deletion in uracil biosynthesis genes (pyrFA) have a dramatic increase in the number of single nucleotide polymorphisms, insertions/deletions, and ISCbe4 insertion elements in their genomes compared to the wild type. At least one daughter strain of this lineage also contains large-scale genome rearrangements that are flanked by these ISCbe4 elements. In contrast, strains developed from the second background strain developed using a targeted deletion strategy of the uracil biosynthetic gene pyrE have a stable genome structure, making them preferable for future metabolic engineering studies. Copyright © 2017 American Society for Microbiology.