Pecan (Carya illinoinensis) and Chinese hickory (C. cathayensis) are important commercially cultivated nut trees in the genus Carya (Juglandaceae), with high nutritional value and substantial health benefits.We obtained >187.22 and 178.87 gigabases of sequence, and ~288× and 248× genome coverage, to a pecan cultivar (“Pawnee”) and a domesticated Chinese hickory landrace (ZAFU-1), respectively. The total assembly size is 651.31 megabases (Mb) for pecan and 706.43 Mb for Chinese hickory. Two genome duplication events before the divergence from walnut were found in these species. Gene family analysis highlighted key genes in biotic and abiotic tolerance, oil, polyphenols, essential amino acids, and B vitamins. Further analyses of reduced-coverage genome sequences of 16 Carya and 2 Juglans species provide additional phylogenetic perspective on crop wild relatives.Cooperative characterization of these valuable resources provides a window to their evolutionary development and a valuable foundation for future crop improvement. © The Author(s) 2019. Published by Oxford University Press.
Identification, expression, alternative splicing and functional analysis of pepper WRKY gene family in response to biotic and abiotic stresses.
WRKY proteins are a large group of plant transcription factors that are involved in various biological processes, including biotic and abiotic stress responses, hormone response, plant development, and metabolism. WRKY proteins have been identified in several plants, but only a few have been identified in Capsicum annuum. Here, we identified a total of 62 WRKY genes in the latest pepper genome. These genes were classified into three groups (Groups 1-3) based on the structural features of their proteins. The structures of the encoded proteins, evolution, and expression under normal growth conditions were analyzed and 35 putative miRNA target sites were predicted in 20 CaWRKY genes. Moreover, the response to cold or CMV treatments of selected WRKY genes were examined to validate the roles under stresses. And alternative splicing (AS) events of some CaWRKYs were also identified under CMV infection. Promoter analysis confirmed that CaWRKY genes are involved in growth, development, and biotic or abiotic stress responses in hot pepper. The comprehensive analysis provides fundamental information for better understanding of the signaling pathways involved in the WRKY-mediated regulation of developmental processes, as well as biotic and abiotic stress responses.
Comprehensive transcriptome analysis reveals genes potentially involved in isoflavone biosynthesis in Pueraria thomsonii Benth.
Pueraria thomsonii Benth is an important medicinal plant. Transcriptome sequencing, unigene assembly, the annotation of transcripts and the study of gene expression profiles play vital roles in gene function research. However, the full-length transcriptome of P. thomsonii remains unknown. Here, we obtained 44,339 nonredundant transcripts of P. thomsonii by using the PacBio RS II Isoform and Illumina sequencing platforms, of which 43,195 were annotated genes. Compared with the expression levels in the plant roots, those of transcripts with a |fold change| = 4 and FDR < 0.01 in the leaves or stems were assigned as differentially expressed transcripts (DETs). In total, we found 9,225 DETs, 32 of which came from structural genes that were potentially involved in isoflavone biosynthesis. The expression profiles of 8 structural genes from the RNA-Seq data were validated by qRT-PCR. We identified 437 transcription factors (TFs) that were positively or negatively correlated with at least 1 of the structural genes involved in isoflavone biosynthesis using Pearson correlation coefficients (r) (r > 0.8 or r < -0.8). We also identified a total of 32 microRNAs (miRNAs), which targeted 805 transcripts. These miRNAs caused enriched function in 'ATP binding', 'defense response', 'ADP binding', and 'signal transduction'. Interestingly, MIR156a potentially promoted isoflavone biosynthesis by repressing SBP, and MIR319 promoted isoflavone biosynthesis by repressing TCP and HB-HD-ZIP. Finally, we identified 2,690 alternative splicing events, including that of the structural genes of trans-cinnamate 4-monooxygenase and pullulanase, which are potentially involved in the biosynthesis of isoflavone and starch, respectively, and of three TFs potentially involved in isoflavone biosynthesis. Together, these results provide us with comprehensive insight into the gene expression and regulation of P. thomsonii.
Genome sequence of Jatropha curcas L., a non-edible biodiesel plant, provides a resource to improve seed-related traits.
Jatropha curcas (physic nut), a non-edible oilseed crop, represents one of the most promising alternative energy sources due to its high seed oil content, rapid growth and adaptability to various environments. We report ~339 Mbp draft whole genome sequence of J. curcas var. Chai Nat using both the PacBio and Illumina sequencing platforms. We identified and categorized differentially expressed genes related to biosynthesis of lipid and toxic compound among four stages of seed development. Triacylglycerol (TAG), the major component of seed storage oil, is mainly synthesized by phospholipid:diacylglycerol acyltransferase in Jatropha, and continuous high expression of homologs of oleosin over seed development contributes to accumulation of high level of oil in kernels by preventing the breakdown of TAG. A physical cluster of genes for diterpenoid biosynthetic enzymes, including casbene synthases highly responsible for a toxic compound, phorbol ester, in seed cake, was syntenically highly conserved between Jatropha and castor bean. Transcriptomic analysis of female and male flowers revealed the up-regulation of a dozen family of TFs in female flower. Additionally, we constructed a robust species tree enabling estimation of divergence times among nine Jatropha species and five commercial crops in Malpighiales order. Our results will help researchers and breeders increase energy efficiency of this important oil seed crop by improving yield and oil content, and eliminating toxic compound in seed cake for animal feed. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Genome Sequence of Jaltomata Addresses Rapid Reproductive Trait Evolution and Enhances Comparative Genomics in the Hyper-Diverse Solanaceae.
Within the economically important plant family Solanaceae, Jaltomata is a rapidly evolving genus that has extensive diversity in flower size and shape, as well as fruit and nectar color, among its ~80 species. Here, we report the whole-genome sequencing, assembly, and annotation, of one representative species (Jaltomata sinuosa) from this genus. Combining PacBio long reads (25×) and Illumina short reads (148×) achieved an assembly of ~1.45?Gb, spanning ~96% of the estimated genome. Ninety-six percent of curated single-copy orthologs in plants were detected in the assembly, supporting a high level of completeness of the genome. Similar to other Solanaceous species, repetitive elements made up a large fraction (~80%) of the genome, with the most recently active element, Gypsy, expanding across the genome in the last 1-2 Myr. Computational gene prediction, in conjunction with a merged transcriptome data set from 11 tissues, identified 34,725 protein-coding genes. Comparative phylogenetic analyses with six other sequenced Solanaceae species determined that Jaltomata is most likely sister to Solanum, although a large fraction of gene trees supported a conflicting bipartition consistent with substantial introgression between Jaltomata and Capsicum after these species split. We also identified gene family dynamics specific to Jaltomata, including expansion of gene families potentially involved in novel reproductive trait development, and loss of gene families that accompanied the loss of self-incompatibility. This high-quality genome will facilitate studies of phenotypic diversification in this rapidly radiating group and provide a new point of comparison for broader analyses of genomic evolution across the Solanaceae.
High Quality Draft Genome of Arogyapacha (Trichopus zeylanicus), an Important Medicinal Plant Endemic to Western Ghats of India.
Arogyapacha, the local name of Trichopus zeylanicus, is a rare, indigenous medicinal plant of India. This plant is famous for its traditional use as an instant energy stimulant. So far, no genomic resource is available for this important plant and hence its metabolic pathways are poorly understood. Here, we report on a high-quality draft assembly of approximately 713.4 Mb genome of T. zeylanicus, first draft genome from the genus Trichopus The assembly was generated in a hybrid approach using Illumina short-reads and Pacbio longer-reads. The total assembly comprised of 22601 scaffolds with an N50 value of 433.3 Kb. We predicted 34452 protein coding genes in T. zeylanicus genome and found that a significant portion of these predicted genes were associated with various secondary metabolite biosynthetic pathways. Comparative genome analysis revealed extensive gene collinearity between T. zeylanicus and its closely related plant species. The present genome and annotation data provide an essential resource to speed-up the research on secondary metabolism, breeding and molecular evolution of T. zeylanicus. Copyright © 2019 Chellappan et al.
Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes.Copyright © 2019 The Author(s). Published by Elsevier Inc. All rights reserved.
A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis
Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.
Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally shorter transcripts were detected by normalization. For the same input cDNA and the same data yield, the normalized library recovered more total transcript isoforms, number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ~1.25 kb, more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ~52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ~80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.