PacBio Sequencing is characterized by very long sequence reads (averaging > 10,000 bases), lack of GC-bias, and high consensus accuracy. These features have allowed the method to provide a new…
Chlorella vulgaris genome assembly and annotation reveals the molecular basis for metabolic acclimation to high light conditions.
Chlorella vulgaris is a fast-growing fresh-water microalga cultivated at the industrial scale for applications ranging from food to biofuel production. To advance our understanding of its biology and to establish genetics tools for biotechnological manipulation, we sequenced the nuclear and organelle genomes of Chlorella vulgaris 211/11P by combining next generation sequencing and optical mapping of isolated DNA molecules. This hybrid approach allowed to assemble the nuclear genome in 14 pseudo-molecules with an N50 of 2.8 Mb and 98.9% of scaffolded genome. The integration of RNA-seq data obtained at two different irradiances of growth (high light-HL versus low light -LL) enabled to identify 10,724 nuclear genes, coding for 11,082 transcripts. Moreover 121 and 48 genes were respectively found in the chloroplast and mitochondrial genome. Functional annotation and expression analysis of nuclear, chloroplast and mitochondrial genome sequences revealed peculiar features of Chlorella vulgaris. Evidence of horizontal gene transfers from chloroplast to mitochondrial genome was observed. Furthermore, comparative transcriptomic analyses of LL vs HL provide insights into the molecular basis for metabolic rearrangement in HL vs. LL conditions leading to enhanced de novo fatty acid biosynthesis and triacylglycerol accumulation. The occurrence of a cytosolic fatty acid biosynthetic pathway can be predicted and its upregulation upon HL exposure is observed, consistent with increased lipid amount under HL. These data provide a rich genetic resource for future genome editing studies, and potential targets for biotechnological manipulation of Chlorella vulgaris or other microalgae species to improve biomass and lipid productivity.This article is protected by copyright. All rights reserved.
Complete Genome Sequence of Shewanella sp. Strain TH2012, Isolated from Shrimp in a Cultivation Pond Exhibiting Early Mortality Syndrome.
Here, we present the complete genome sequence of a Shewanella isolate, TH2012, from a shrimp pond in which shrimp exhibited early mortality syndrome (EMS)/acute hepatopancreatic necrosis disease (AHPND). The complete genome of TH2012 has a prophage-like element and a number of potential virulence factors, making TH2012 a possible contributing factor to EMS outbreaks. Copyright © 2019 Wechprasit et al.
Complete Genome Sequence of Halocella sp. Strain SP3-1, an Extremely Halophilic, Glycoside Hydrolase- and Bacteriocin-Producing Bacterium Isolated from a Salt Evaporation Pond.
Halocella sp. strain SP3-1, a cellulose-degrading bacterium, was isolated from a hypersaline evaporation pond in Thailand. Here, we report the first complete genome sequence of strain SP3-1. This species has a genome size of 4,035,760 bases, and the genome contains several genes encoding cellulose, hemicellulose, starch-degrading enzymes, and bacteriocins.
Complete chloroplast genome sequences of Kaempferia galanga and Kaempferia elegans: Molecular structures and comparative analysis.
Kaempferia galanga and Kaempferia elegans, which belong to the genus Kaempferia family Zingiberaceae, are used as valuable herbal medicine and ornamental plants, respectively. The chloroplast genomes have been used for molecular markers, species identification and phylogenetic studies. In this study, the complete chloroplast genome sequences of K. galanga and K. elegans are reported. Results show that the complete chloroplast genome of K. galanga is 163,811 bp long, having a quadripartite structure with large single copy (LSC) of 88,405 bp and a small single copy (SSC) of 15,812 bp separated by inverted repeats (IRs) of 29,797 bp. Similarly, the complete chloroplast genome of K. elegans is 163,555 bp long, having a quadripartite structure in which IRs of 29,773 bp length separates 88,020 bp of LSC and 15,989 bp of SSC. A total of 111 genes in K. galanga and 113 genes in K. elegans comprised 79 protein-coding genes and 4 ribosomal RNA (rRNA) genes, as well as 28 and 30 transfer RNA (tRNA) genes in K. galanga and K. elegans, respectively. The gene order, GC content and orientation of the two Kaempferia chloroplast genomes exhibited high similarity. The location and distribution of simple sequence repeats (SSRs) and long repeat sequences were determined. Eight highly variable regions between the two Kaempferia species were identified and 643 mutation events, including 536 single-nucleotide polymorphisms (SNPs) and 107 insertion/deletions (indels), were accurately located. Sequence divergences of the whole chloroplast genomes were calculated among related Zingiberaceae species. The phylogenetic analysis based on SNPs among eleven species strongly supported that K. galanga and K. elegans formed a cluster within Zingiberaceae. This study identified the unique characteristics of the entire K. galanga and K. elegans chloroplast genomes that contribute to our understanding of the chloroplast DNA evolution within Zingiberaceae species. It provides valuable information for phylogenetic analysis and species identification within genus Kaempferia.
Intercellular communication is required for trap formation in the nematode-trapping fungus Duddingtonia flagrans.
Nematode-trapping fungi (NTF) are a large and diverse group of fungi, which may switch from a saprotrophic to a predatory lifestyle if nematodes are present. Different fungi have developed different trapping devices, ranging from adhesive cells to constricting rings. After trapping, fungal hyphae penetrate the worm, secrete lytic enzymes and form a hyphal network inside the body. We sequenced the genome of Duddingtonia flagrans, a biotechnologically important NTF used to control nematode populations in fields. The 36.64 Mb genome encodes 9,927 putative proteins, among which are more than 638 predicted secreted proteins. Most secreted proteins are lytic enzymes, but more than 200 were classified as small secreted proteins (< 300 amino acids). 117 putative effector proteins were predicted, suggesting interkingdom communication during the colonization. As a first step to analyze the function of such proteins or other phenomena at the molecular level, we developed a transformation system, established the fluorescent proteins GFP and mCherry, adapted an assay to monitor protein secretion, and established gene-deletion protocols using homologous recombination or CRISPR/Cas9. One putative virulence effector protein, PefB, was transcriptionally induced during the interaction. We show that the mature protein is able to be imported into nuclei in Caenorhabditis elegans cells. In addition, we studied trap formation and show that cell-to-cell communication is required for ring closure. The availability of the genome sequence and the establishment of many molecular tools will open new avenues to studying this biotechnologically relevant nematode-trapping fungus.
Comprehensive transcriptome analysis reveals genes potentially involved in isoflavone biosynthesis in Pueraria thomsonii Benth.
Pueraria thomsonii Benth is an important medicinal plant. Transcriptome sequencing, unigene assembly, the annotation of transcripts and the study of gene expression profiles play vital roles in gene function research. However, the full-length transcriptome of P. thomsonii remains unknown. Here, we obtained 44,339 nonredundant transcripts of P. thomsonii by using the PacBio RS II Isoform and Illumina sequencing platforms, of which 43,195 were annotated genes. Compared with the expression levels in the plant roots, those of transcripts with a |fold change| = 4 and FDR < 0.01 in the leaves or stems were assigned as differentially expressed transcripts (DETs). In total, we found 9,225 DETs, 32 of which came from structural genes that were potentially involved in isoflavone biosynthesis. The expression profiles of 8 structural genes from the RNA-Seq data were validated by qRT-PCR. We identified 437 transcription factors (TFs) that were positively or negatively correlated with at least 1 of the structural genes involved in isoflavone biosynthesis using Pearson correlation coefficients (r) (r > 0.8 or r < -0.8). We also identified a total of 32 microRNAs (miRNAs), which targeted 805 transcripts. These miRNAs caused enriched function in 'ATP binding', 'defense response', 'ADP binding', and 'signal transduction'. Interestingly, MIR156a potentially promoted isoflavone biosynthesis by repressing SBP, and MIR319 promoted isoflavone biosynthesis by repressing TCP and HB-HD-ZIP. Finally, we identified 2,690 alternative splicing events, including that of the structural genes of trans-cinnamate 4-monooxygenase and pullulanase, which are potentially involved in the biosynthesis of isoflavone and starch, respectively, and of three TFs potentially involved in isoflavone biosynthesis. Together, these results provide us with comprehensive insight into the gene expression and regulation of P. thomsonii.
Complete Genome Sequence of Actinosynnema pretiosum X47, An Industrial Strain that Produces the Antibiotic Ansamitocin AP-3.
Ansamitocins are extraordinarily potent antitumor agents. Ansamitocin P-3 (AP-3), which is produced by Actinosynnema pretiosum, has been developed as a cytotoxic drug for breast cancer. Despite its importance, AP-3 is of limited applicability because of the low production yield. A. pretiosum strain X47 was developed from A. pretiosum ATCC 31565 by mutation breeding and shows a relatively high AP-3 yield. Here, we analyzed the A. pretiosum X47 genome, which is ~8.13 Mb in length with 6693 coding sequences, 58 tRNA genes, and 15 rRNA genes. The DNA sequence of the ansamitocin biosynthetic gene cluster is highly similar to that of the corresponding cluster in A. pretiosum ATCC 31565, with 99.9% identity. However, RT-qPCR analysis showed that the expression levels of ansamitocin biosynthetic genes were significantly increased in X47 compared with the levels in the wild-type strain, consistent with the higher yield of AP-3 in X47. The annotated complete genome sequence of this strain will facilitate understanding the molecular mechanisms of ansamitocin biosynthesis and regulation in A. pretiosum and help further genetic engineering studies to enhance the production of AP-3.
Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton.
Allotetraploid cotton is an economically important natural-fiber-producing crop worldwide. After polyploidization, Gossypium hirsutum L. evolved to produce a higher fiber yield and to better survive harsh environments than Gossypium barbadense, which produces superior-quality fibers. The global genetic and molecular bases for these interspecies divergences were unknown. Here we report high-quality de novo-assembled genomes for these two cultivated allotetraploid species with pronounced improvement in repetitive-DNA-enriched centromeric regions. Whole-genome comparative analyses revealed that species-specific alterations in gene expression, structural variations and expanded gene families were responsible for speciation and the evolutionary history of these species. These findings help to elucidate the evolution of cotton genomes and their domestication history. The information generated not only should enable breeders to improve fiber quality and resilience to ever-changing environmental conditions but also can be translated to other crops for better understanding of their domestication history and use in improvement.
Genome sequence of Jatropha curcas L., a non-edible biodiesel plant, provides a resource to improve seed-related traits.
Jatropha curcas (physic nut), a non-edible oilseed crop, represents one of the most promising alternative energy sources due to its high seed oil content, rapid growth and adaptability to various environments. We report ~339 Mbp draft whole genome sequence of J. curcas var. Chai Nat using both the PacBio and Illumina sequencing platforms. We identified and categorized differentially expressed genes related to biosynthesis of lipid and toxic compound among four stages of seed development. Triacylglycerol (TAG), the major component of seed storage oil, is mainly synthesized by phospholipid:diacylglycerol acyltransferase in Jatropha, and continuous high expression of homologs of oleosin over seed development contributes to accumulation of high level of oil in kernels by preventing the breakdown of TAG. A physical cluster of genes for diterpenoid biosynthetic enzymes, including casbene synthases highly responsible for a toxic compound, phorbol ester, in seed cake, was syntenically highly conserved between Jatropha and castor bean. Transcriptomic analysis of female and male flowers revealed the up-regulation of a dozen family of TFs in female flower. Additionally, we constructed a robust species tree enabling estimation of divergence times among nine Jatropha species and five commercial crops in Malpighiales order. Our results will help researchers and breeders increase energy efficiency of this important oil seed crop by improving yield and oil content, and eliminating toxic compound in seed cake for animal feed. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
The Genome of Cucurbita argyrosperma (Silver-Seed Gourd) Reveals Faster Rates of Protein-Coding Gene and Long Noncoding RNA Turnover and Neofunctionalization within Cucurbita.
Whole-genome duplications are an important source of evolutionary novelties that change the mode and tempo at which genetic elements evolve within a genome. The Cucurbita genus experienced a whole-genome duplication around 30 million years ago, although the evolutionary dynamics of the coding and noncoding genes in this genus have not yet been scrutinized. Here, we analyzed the genomes of four Cucurbita species, including a newly assembled genome of Cucurbita argyrosperma, and compared the gene contents of these species with those of five other members of the Cucurbitaceae family to assess the evolutionary dynamics of protein-coding and long intergenic noncoding RNA (lincRNA) genes after the genome duplication. We report that Cucurbita genomes have a higher protein-coding gene birth-death rate compared with the genomes of the other members of the Cucurbitaceae family. C. argyrosperma gene families associated with pollination and transmembrane transport had significantly faster evolutionary rates. lincRNA families showed high levels of gene turnover throughout the phylogeny, and 67.7% of the lincRNA families in Cucurbita showed evidence of birth from the neofunctionalization of previously existing protein-coding genes. Collectively, our results suggest that the whole-genome duplication in Cucurbita resulted in faster rates of gene family evolution through the neofunctionalization of duplicated genes. Copyright © 2019 The Author. Published by Elsevier Inc. All rights reserved.
A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis
Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.
Transcriptome analysis reveals multiple signal network contributing to the Verticillium wilt resistance in eggplant
Verticillium wilt is a devastating disease in eggplants. In order to understand the molecular mechanism of disease resistance in eggplants, transcriptomes of Verticillium wilt infected eggplants were detected. A total of 480, 518, 887 and 1 046 Verticillium wilt related differentially expressed genes were identified at 6 (V6), 12 (V12), 24 (V24) and 48?h (V48), respectively. COG function classification revealed that most of DEGs functioned in “Amino acid transport and metabolism”, “Cytoskeleton” and “Cell motility”. In addition, compared the control plants (V0) to infected eggplants (V6-V48), a total of 111 common DEGs were identified. Except for “General function prediction only”, most of the DEGs enriched in “Signal transduction”. DEGs associated to different hormone signals, including GID1B, ROPGAP1, OPT3 and CDPK, were identified throughout the whole infection process. Cross-talk among defense signal pathways plays major roles in the Verticillium wilt disease resistance in eggplants.
Genome analysis and genetic transformation of a water surface-floating microalga Chlorococcum sp. FFG039.
Microalgal harvesting and dewatering are the main bottlenecks that need to be overcome to tap the potential of microalgae for production of valuable compounds. Water surface-floating microalgae form robust biofilms, float on the water surface along with gas bubbles entrapped under the biofilms, and have great potential to overcome these bottlenecks. However, little is known about the molecular mechanisms involved in the water surface-floating phenotype. In the present study, we analysed the genome sequence of a water surface-floating microalga Chlorococcum sp. FFG039, with a next generation sequencing technique to elucidate the underlying mechanisms. Comparative genomics study with Chlorococcum sp. FFG039 and other non-floating green microalgae revealed some of the unique gene families belonging to this floating microalga, which may be involved in biofilm formation. Furthermore, genetic transformation of this microalga was achieved with an electroporation method. The genome information and transformation techniques presented in this study will be useful to obtain molecular insights into the water surface-floating phenotype of Chlorococcum sp. FFG039.
Genome-wide analysis of DNA methylation patterns using single molecule real-time DNA sequencing has boosted the number of publicly available methylomes. However, there is a lack of tools coupling methylation patterns and the corresponding methyltransferase genes. Here we demonstrate a high-throughput method for coupling methyltransferases with their respective motifs, using automated cloning and analysing the methyltransferases in vectors carrying a strain-specific cassette containing all potential target sites. To validate the method, we analyse the genomes of the thermophile Moorella thermoacetica and the mesophile Acetobacterium woodii, two acetogenic bacteria having substantially modified genomes with 12 methylation motifs and a total of 23 methyltransferase genes. Using our method, we characterize the 23 methyltransferases, assign motifs to the respective enzymes and verify activity for 11 of the 12 motifs.