Scylla paramamosain is an important aquaculture crab, which has great economical and nutritional value. To the best of our knowledge, few full-length crab transcriptomes are available. In this study, a library composed of 12 different tissues including gill, hepatopancreas, muscle, cerebral ganglion, eyestalk, thoracic ganglia, intestine, heart, testis, ovary, sperm reservoir, and hemocyte was constructed and sequenced using Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology. A total of 284803 full-length non-chimeric reads were obtained, from which 79005 high-quality unique transcripts were obtained after error correction and sequence clustering and redundant. Additionally, a total of 52544 transcripts were annotated against protein database (NCBI nonredundant, Swiss-Prot, KOG, and KEGG database). A total of 23644 long non-coding RNAs (lncRNAs) and 131561 simple sequence repeats (SSRs) were identified. Meanwhile, the isoforms of many genes were also identified in this study. Our study provides a rich set of full-length cDNA sequences for S. paramamosain, which will greatly facilitate S. paramamosain research.
Comprehensive identification of the full-length transcripts and alternative splicing related to the secondary metabolism pathways in the tea plant (Camellia sinensis).
Flavonoids, theanine and caffeine are the main secondary metabolites of the tea plant (Camellia sinensis), which account for the tea’s unique flavor quality and health benefits. The biosynthesis pathways of these metabolites have been extensively studied at the transcriptional level, but the regulatory mechanisms are still unclear. In this study, to explore the transcriptome diversity and complexity of tea plant, PacBio Iso-Seq and RNA-seq analysis were combined to obtain full-length transcripts and to profile the changes in gene expression during the leaf development. A total of 1,388,066 reads of insert (ROI) were generated with an average length of 1,762?bp, and more than 54% (755,716) of the ROIs were full-length non-chimeric (FLNC) reads. The Benchmarking Universal Single-Copy Orthologue (BUSCO) completeness was 92.7%. A total of 93,883 non-redundant transcripts were obtained, and 87,395 (93.1%) were new alternatively spliced isoforms. Meanwhile, 7,650 differential expression transcripts (DETs) were identified. A total of 28,980 alternative splicing (AS) events were predicted, including 1,297 differential AS (DAS) events. The transcript isoforms of the key genes involved in the flavonoid, theanine and caffeine biosynthesis pathways were characterized. Additionally, 5,777 fusion transcripts and 9,052 long non-coding RNAs (lncRNAs) were also predicted. Our results revealed that AS potentially plays a crucial role in the regulation of the secondary metabolism of the tea plant. These findings enhanced our understanding of the complexity of the secondary metabolic regulation of tea plants and provided a basis for the subsequent exploration of the regulatory mechanisms of flavonoid, theanine and caffeine biosynthesis in tea plants.
Complete Assembly of the Genome of an Acidovorax citrulli Strain Reveals a Naturally Occurring Plasmid in This Species.
Acidovorax citrulli is the causal agent of bacterial fruit blotch (BFB), a serious threat to cucurbit crop production worldwide. Based on genetic and phenotypic properties, A. citrulli strains are divided into two major groups: group I strains have been generally isolated from melon and other non-watermelon cucurbits, while group II strains are closely associated with watermelon. In a previous study, we reported the genome of the group I model strain, M6. At that time, the M6 genome was sequenced by MiSeq Illumina technology, with reads assembled into 139 contigs. Here, we report the assembly of the M6 genome following sequencing with PacBio technology. This approach not only allowed full assembly of the M6 genome, but it also revealed the occurrence of a ~53 kb plasmid. The M6 plasmid, named pACM6, was further confirmed by plasmid extraction, Southern-blot analysis of restricted fragments and obtention of M6-derivative cured strains. pACM6 occurs at low copy numbers (average of ~4.1 ± 1.3 chromosome equivalents) in A. citrulli M6 and contains 63 open reading frames (ORFs), most of which (55.6%) encoding hypothetical proteins. The plasmid contains several genes encoding type IV secretion components, and typical plasmid-borne genes involved in plasmid maintenance, replication and transfer. The plasmid also carries an operon encoding homologs of a Fic-VbhA toxin-antitoxin (TA) module. Transcriptome data from A. citrulli M6 revealed that, under the tested conditions, the genes encoding the components of this TA system are among the highest expressed genes in pACM6. Whether this TA module plays a role in pACM6 maintenance is still to be determined. Leaf infiltration and seed transmission assays revealed that, under tested conditions, the loss of pACM6 did not affect the virulence of A. citrulli M6. We also show that pACM6 or similar plasmids are present in several group I strains, but absent in all tested group II strains of A. citrulli.
Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally shorter transcripts were detected by normalization. For the same input cDNA and the same data yield, the normalized library recovered more total transcript isoforms, number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ~1.25 kb, more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ~52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ~80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.
Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe.
Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and identified 25,246 genes, 86 pre-miRNAs and 1468 lncRNAs. Most genes (12,832, 50.83%) exhibited two or more isoforms, suggesting a high degree of transcriptome complexity in G. australe. A total of 31,448 AS events in five major types were found among the 9944 gene loci. Among these five major types, intron retention was the most frequent, accounting for 68.85% of AS events. 29,718 polyadenylation sites were detected from 14,536 genes, 7900 of which have alternative polyadenylation sites (APA). In addition, based on our AS events annotations, RNA-Seq short reads from germinating seeds showed that differential expression of these events occurred during seed germination. Ten AS events that were randomly selected were further confirmed by RT-PCR amplification in leaf and germinating seeds.The reconstructed gene sequences and their AS in G. australe would provide information for exploring beneficial characteristics in G. australe.
Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities. © The Author(s) 2019.
De novo transcriptome assembly of the cubomedusa Tripedalia cystophora, including the analysis of a set of genes involved in peptidergic neurotransmission.
The phyla Cnidaria, Placozoa, Ctenophora, and Porifera emerged before the split of proto- and deuterostome animals, about 600 million years ago. These early metazoans are interesting, because they can give us important information on the evolution of various tissues and organs, such as eyes and the nervous system. Generally, cnidarians have simple nervous systems, which use neuropeptides for their neurotransmission, but some cnidarian medusae belonging to the class Cubozoa (box jellyfishes) have advanced image-forming eyes, probably associated with a complex innervation. Here, we describe a new transcriptome database from the cubomedusa Tripedalia cystophora.Based on the combined use of the Illumina and PacBio sequencing technologies, we produced a highly contiguous transcriptome database from T. cystophora. We then developed a software program to discover neuropeptide preprohormones in this database. This script enabled us to annotate seven novel T. cystophora neuropeptide preprohormone cDNAs: One coding for 19 copies of a peptide with the structure pQWLRGRFamide; one coding for six copies of a different RFamide peptide; one coding for six copies of pQPPGVWamide; one coding for eight different neuropeptide copies with the C-terminal LWamide sequence; one coding for thirteen copies of a peptide with the RPRAamide C-terminus; one coding for four copies of a peptide with the C-terminal GRYamide sequence; and one coding for seven copies of a cyclic peptide, of which the most frequent one has the sequence CTGQMCWFRamide. We could also identify orthologs of these seven preprohormones in the cubozoans Alatina alata, Carybdea xaymacana, Chironex fleckeri, and Chiropsalmus quadrumanus. Furthermore, using TBLASTN screening, we could annotate four bursicon-like glycoprotein hormone subunits, five opsins, and 52 other family-A G protein-coupled receptors (GPCRs), which also included two leucine-rich repeats containing G protein-coupled receptors (LGRs) in T. cystophora. The two LGRs are potential receptors for the glycoprotein hormones, while the other GPCRs are candidate receptors for the above-mentioned neuropeptides.By combining Illumina and PacBio sequencing technologies, we have produced a new high-quality de novo transcriptome assembly from T. cystophora that should be a valuable resource for identifying the neuronal components that are involved in vision and other behaviors in cubomedusae.
Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community.
DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the ‘metaepigenomes’ of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We identified methyltransferase genes likely responsible for methylation of the novel motifs, and confirmed the catalytic specificities of four of them via transformation experiments using synthetic genes. Our study highlights metaepigenomics as a powerful approach for identification of the vast unexplored variety of prokaryotic DNA methylation systems in nature.
Full-length transcript sequencing and comparative transcriptomic analysis to evaluate the contribution of osmotic and ionic stress components towards salinity tolerance in the roots of cultivated alfalfa (Medicago sativa L.).
Alfalfa is the most extensively cultivated forage legume. Salinity is a major environmental factor that impacts on alfalfa’s productivity. However, little is known about the molecular mechanisms underlying alfalfa responses to salinity, especially the relative contribution of the two important components of osmotic and ionic stress.In this study, we constructed the first full-length transcriptome database for alfalfa root tips under continuous NaCl and mannitol treatments for 1, 3, 6, 12, and 24?h (three biological replicates for each time points, including the control group) via PacBio Iso-Seq. This resulted in the identification of 52,787 full-length transcripts, with an average length of 2551?bp. Global transcriptional changes in the same 33 stressed samples were then analyzed via BGISEQ-500 RNA-Seq. Totals of 8861 NaCl-regulated and 8016 mannitol-regulated differentially expressed genes (DEGs) were identified. Metabolic analyses revealed that these DEGs overlapped or diverged in the cascades of molecular networks involved in signal perception, signal transduction, transcriptional regulation, and antioxidative defense. Notably, several well characterized signalling pathways, such as CDPK, MAPK, CIPK, and PYL-PP2C-SnRK2, were shown to be involved in osmotic stress, while the SOS core pathway was activated by ionic stress. Moreover, the physiological shifts of catalase and peroxidase activity, glutathione and proline content were in accordance with dynamic transcript profiles of the relevant genes, indicating that antioxidative defense system plays critical roles in response to salinity stress.Overall, our study provides evidence that the response to salinity stress in alfalfa includes both osmotic and ionic components. The key osmotic and ionic stress-related genes are candidates for future studies as potential targets to improve resistance to salinity stress via genetic engineering.