Menu
September 22, 2019  |  

Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA-seq in strawberry.

Alternative splicing (AS) is a key post-transcriptional regulatory mechanism, yet little information is known about its roles in fruit crops. Here, AS was globally analyzed in the wild strawberry Fragaria vesca genome with RNA-seq data derived from different stages of fruit development. The AS landscape was characterized and compared between the single-molecule, real-time (SMRT) and Illumina RNA-seq platform. While SMRT has a lower sequencing depth, it identifies more genes undergoing AS (57.67% of detected multiexon genes) when it is compared with Illumina (33.48%), illustrating the efficacy of SMRT in AS identification. We investigated different modes of AS in the context of fruit development; the percentage of intron retention (IR) is markedly reduced whereas that of alternative acceptor sites (AA) is significantly increased post-fertilization when compared with pre-fertilization. When all the identified transcripts were combined, a total of 66.43% detected multiexon genes in strawberry undergo AS, some of which lead to a gain or loss of conserved domains in the gene products. The work demonstrates that SMRT sequencing is highly powerful in AS discovery and provides a rich data resource for later functional studies of different isoforms. Further, shifting AS modes may contribute to rapid changes of gene expression during fruit set.© 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.


September 22, 2019  |  

Elevated expression of a minor isoform of ANK3 is a risk factor for bipolar disorder.

Ankyrin-3 (ANK3) is one of the few genes that have been consistently identified as associated with bipolar disorder by multiple genome-wide association studies. However, the exact molecular basis of the association remains unknown. A rare loss-of-function splice-site SNP (rs41283526*G) in a minor isoform of ANK3 (incorporating exon ENSE00001786716) was recently identified as protective of bipolar disorder and schizophrenia. This suggests that an elevated expression of this isoform may be involved in the etiology of the disorders. In this study, we used novel approaches and data sets to test this hypothesis. First, we strengthen the statistical evidence supporting the allelic association by replicating the protective effect of the minor allele of rs41283526 in three additional large independent samples (meta-analysis p-values: 6.8E-05 for bipolar disorder and 8.2E-04 for schizophrenia). Second, we confirm the hypothesis that both bipolar and schizophrenia patients have a significantly higher expression of this isoform than controls (p-values: 3.3E-05 for schizophrenia and 9.8E-04 for bipolar type I). Third, we determine the transcription start site for this minor isoform by Pacific Biosciences sequencing of full-length cDNA and show that it is primarily expressed in the corpus callosum. Finally, we combine genotype and expression data from a large Norwegian sample of psychiatric patients and controls, and show that the risk alleles in ANK3 identified by bipolar disorder GWAS are located near the transcription start site of this isoform and are significantly associated with its elevated expression. Together, these results point to the likely molecular mechanism underlying ANK3´s association with bipolar disorder.


September 22, 2019  |  

Integrated DNA methylome and transcriptome analysis reveals the ethylene-induced flowering pathway genes in pineapple.

Ethylene has long been used to promote flowering in pineapple production. Ethylene-induced flowering is dose dependent, with a critical threshold level of ethylene response factors needed to trigger flowering. The mechanism of ethylene-induced flowering is still unclear. Here, we integrated isoform sequencing (iso-seq), Illumina short-reads sequencing and whole-genome bisulfite sequencing (WGBS) to explore the early changes of transcriptomic and DNA methylation in pineapple following high-concentration ethylene (HE) and low-concentration ethylene (LE) treatment. Iso-seq produced 122,338 transcripts, including 26,893 alternative splicing isoforms, 8,090 novel transcripts and 12,536 candidate long non-coding RNAs. The WGBS results suggested a decrease in CG methylation and increase in CHH methylation following HE treatment. The LE and HE treatments induced drastic changes in transcriptome and DNA methylome, with LE inducing the initial response to flower induction and HE inducing the subsequent response. The dose-dependent induction of FLOWERING LOCUS T-like genes (FTLs) may have contributed to dose-dependent flowering induction in pineapple by ethylene. Alterations in DNA methylation, lncRNAs and multiple genes may be involved in the regulation of FTLs. Our data provided a landscape of the transcriptome and DNA methylome and revealed a candidate network that regulates flowering time in pineapple, which may promote further studies.


September 22, 2019  |  

A survey of transcriptome complexity in Sus scrofa using single-molecule long-read sequencing.

Alternative splicing (AS) and fusion transcripts produce a vast expansion of transcriptomes and proteomes diversity. However, the reliability of these events and the extend of epigenetic mechanisms have not been adequately addressed due to its limitation of uncertainties about the complete structure of mRNA. Here we combined single-molecule real-time sequencing, Illumina RNA-seq and DNA methylation data to characterize the landscapes of DNA methylation on AS, fusion isoforms formation and lncRNA feature and further to unveil the transcriptome complexity of pig. Our analysis identified an unprecedented scale of high-quality full-length isoforms with over 28,127 novel isoforms from 26,881 novel genes. More than 92,000 novel AS events were detected and intron retention predominated in AS model, followed by exon skipping. Interestingly, we found that DNA methylation played an important role in generating various AS isoforms by regulating splicing sites, promoter regions and first exons. Furthermore, we identified a large of fusion transcripts and novel lncRNAs, and found that DNA methylation of the promoter and gene body could regulate lncRNA expression. Our results significantly improved existed gene models of pig and unveiled that pig AS and epigenetic modify were more complex than previously thought.


September 22, 2019  |  

Hybrid sequencing of full-length cDNA transcripts of stems and leaves in Dendrobium officinale.

Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET) and one sucrose transporter (SUT) are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT) and four cellulose synthase (Ces) genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF) genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.


September 22, 2019  |  

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing.

Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes.The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.


September 22, 2019  |  

Comprehensive profiling of rhizome-associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis).

Moso bamboo (Phyllostachys edulis) represents one of the fastest-spreading plants in the world, due in part to its well-developed rhizome system. However, the post-transcriptional mechanism for the development of the rhizome system in bamboo has not been comprehensively studied. We therefore used a combination of single-molecule long-read sequencing technology and polyadenylation site sequencing (PAS-seq) to re-annotate the bamboo genome, and identify genome-wide alternative splicing (AS) and alternative polyadenylation (APA) in the rhizome system. In total, 145 522 mapped full-length non-chimeric (FLNC) reads were analyzed, resulting in the correction of 2241 mis-annotated genes and the identification of 8091 previously unannotated loci. Notably, more than 42 280 distinct splicing isoforms were derived from 128 667 intron-containing full-length FLNC reads, including a large number of AS events associated with rhizome systems. In addition, we characterized 25 069 polyadenylation sites from 11 450 genes, 6311 of which have APA sites. Further analysis of intronic polyadenylation revealed that LTR/Gypsy and LTR/Copia were two major transposable elements within the intronic polyadenylation region. Furthermore, this study provided a quantitative atlas of poly(A) usage. Several hundred differential poly(A) sites in the rhizome-root system were identified. Taken together, these results suggest that post-transcriptional regulation may potentially have a vital role in the underground rhizome-root system.© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.


September 22, 2019  |  

Full-length transcriptome sequences of ephemeral plant Arabidopsis pumila provides insight into gene expression dynamics during continuous salt stress.

Arabidopsis pumila is native to the desert region of northwest China and it is extraordinarily well adapted to the local semi-desert saline soil, thus providing a candidate plant system for environmental adaptation and salt-tolerance gene mining. However, understanding of the salt-adaptation mechanism of this species is limited because of genomic sequences scarcity. In the present study, the transcriptome profiles of A. pumila leaf tissues treated with 250 mM NaCl for 0, 0.5, 3, 6, 12, 24 and 48 h were analyzed using a combination of second-generation sequencing (SGS) and third-generation single-molecule real-time (SMRT) sequencing.Correction of SMRT long reads by SGS short reads resulted in 59,328 transcripts. We found 8075 differentially expressed genes (DEGs) between salt-stressed tissues and controls, of which 483 were transcription factors and 1157 were transport proteins. Most DEGs were activated within 6 h of salt stress and their expression stabilized after 48 h; the number of DEGs was greatest within 12 h of salt stress. Gene annotation and functional analyses revealed that expression of genes associated with the osmotic and ionic phases rapidly and coordinately changed during the continuous salt stress in this species, and salt stress-related categories were highly enriched among these DEGs, including oxidation-reduction, transmembrane transport, transcription factor activity and ion channel activity. Orphan, MYB, HB, bHLH, C3H, PHD, bZIP, ARF and NAC TFs were most enriched in DEGs; ABCB1, CLC-A, CPK30, KEA2, KUP9, NHX1, SOS1, VHA-A and VP1 TPs were extensively up-regulated in salt-stressed samples, suggesting that they play important roles in slat tolerance. Importantly, further experimental studies identified a mitogen-activated protein kinase (MAPK) gene MAPKKK18 as continuously up-regulated throughout salt stress, suggesting its crucial role in salt tolerance. The expression patterns of the salt-responsive 24 genes resulted from quantitative real-time PCR were basically consistent with their transcript abundance changes identified by RNA-Seq.The full-length transcripts generated in this study provide a more accurate depiction of gene transcription of A. pumila. We identified potential genes involved in salt tolerance of A. pumila. These data present a genetic resource and facilitate better understanding of salt-adaptation mechanism for ephemeral plants.


September 22, 2019  |  

Transcriptional fates of human-specific segmental duplications in brain.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.© 2018 Dougherty et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Full-length transcriptome of Misgurnus anguillicaudatus provides insights into evolution of genus Misgurnus.

Reconstruction and annotation of transcripts, particularly for a species without reference genome, plays a critical role in gene discovery, investigation of genomic signatures, and genome annotation in the pre-genomic era. This study generated 33,330 full-length transcripts of diploid M. anguillicaudatus using PacBio SMRT Sequencing. A total of 6,918 gene families were identified with two or more isoforms, and 26,683 complete ORFs with an average length of 1,497?bp were detected. Totally, 1,208 high-confidence lncRNAs were identified, and most of these appeared to be precursor transcripts of miRNAs or snoRNAs. Phylogenetic tree of the Misgurnus species was inferred based on the 1,905 single copy orthologous genes. The tetraploid and diploid M. anguillicaudatus grouped into a clade, and M. bipartitus showed a closer relationship with the M. anguillicaudatus. The overall evolutionary rates of tetraploid M. anguillicaudatus were significantly higher than those of other Misgurnus species. Meanwhile, 28 positively selected genes were identified in M. anguillicaudatus clade. These positively selected genes may play critical roles in the adaptation to various habitat environments for M. anguillicaudatus. This study could facilitate further exploration of the genomic signatures of M. anguillicaudatus and provide potential insights into unveiling the evolutionary history of tetraploid loach.


September 22, 2019  |  

A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.


September 22, 2019  |  

Alternative splice variants of AID are not stoichiometrically present at the protein level in chronic lymphocytic leukemia

Activation-induced deaminase (AID) is a DNA-mutating enzyme that mediates class-switch recombination as well as somatic hypermutation of antibody genes in B cells. Due to off-target activity, AID is implicated in lymphoma development by introducing genome-wide DNA damage and initiating chromosomal translocations such as c-myc/IgH. Several alternative splice transcripts of AID have been reported in activated B cells as well as malignant B cells such as chronic lymphocytic leukemia (CLL). As most commercially available antibodies fail to recognize alternative splice variants, their abundance in vivo, and hence their biological significance, has not been determined. In this study, we assessed the protein levels of AID splice isoforms by introducing an AID splice reporter construct into cell lines and primary CLL cells from patients as well as from WT and TCL1(tg) C57BL/6 mice (where TCL1 is T-cell leukemia/lymphoma 1). The splice construct is 5′-fused to a GFP-tag, which is preserved in all splice isoforms and allows detection of translated protein. Summarizing, we show a thorough quantification of alternatively spliced AID transcripts and demonstrate that the corresponding protein abundances, especially those of splice variants AID-ivs3 and AID-?E4, are not stoichiometrically equivalent. Our data suggest that enhanced proteasomal degradation of low-abundance proteins might be causative for this discrepancy. © 2013 The Authors. European Journal of Immunology published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


September 22, 2019  |  

Molecular mechanisms of acclimatization to phosphorus starvation and recovery underlying full-length transcriptome profiling in barley (Hordeum vulgare L.).

A lack of phosphorus (P) in plants can severely constrain growth and development. Barley, one of the earliest domesticated crops, is extensively planted in poor soil around the world. To date, the molecular mechanisms of enduring low phosphorus, at the transcriptional level, in barley are still unclear. In the present study, two different barley genotypes (GN121 and GN42)-with contrasting phosphorus efficiency-were used to reveal adaptations to low phosphorus stress, at three time points, at the morphological, physiological, biochemical, and transcriptome level. GN121 growth was less affected by phosphorus starvation and recovery than that of GN42. The biomass and inorganic phosphorus concentration of GN121 and GN42 declined under the low phosphorus-induced stress and increased after recovery with normal phosphorus. However, the range of these parameters was higher in GN42 than in GN121. Subsequently, a more complete genome annotation was obtained by correcting with the data sequenced on Illumina HiSeq X 10 and PacBio RSII SMRT platform. A total of 6,182 and 5,270 differentially expressed genes (DEGs) were identified in GN121 and GN42, respectively. The majority of these DEGs were involved in phosphorus metabolism such as phospholipid degradation, hydrolysis of phosphoric enzymes, sucrose synthesis, phosphorylation/dephosphorylation and post-transcriptional regulation; expression of these genes was significantly different between GN121 and GN42. Specifically, six and seven DEGs were annotated as phosphorus transporters in roots and leaves, respectively. Furthermore, a putative model was constructed relying on key metabolic pathways related to phosphorus to illustrate the higher phosphorus efficiency of GN121 compared to GN42 under low phosphorus conditions. Results from this study provide a multi-transcriptome database and candidate genes for further study on phosphorus use efficiency (PUE).


September 22, 2019  |  

Fusion of TTYH1 with the C19MC microRNA cluster drives expression of a brain-specific DNMT3B isoform in the embryonal brain tumor ETMR.

Embryonal tumors with multilayered rosettes (ETMRs) are rare, deadly pediatric brain tumors characterized by high-level amplification of the microRNA cluster C19MC. We performed integrated genetic and epigenetic analyses of 12 ETMR samples and identified, in all cases, C19MC fusions to TTYH1 driving expression of the microRNAs. ETMR tumors, cell lines and xenografts showed a specific DNA methylation pattern distinct from those of other tumors and normal tissues. We detected extreme overexpression of a previously uncharacterized isoform of DNMT3B originating at an alternative promoter that is active only in the first weeks of neural tube development. Transcriptional and immunohistochemical analyses suggest that C19MC-dependent DNMT3B deregulation is mediated by RBL2, a known repressor of DNMT3B. Transfection with individual C19MC microRNAs resulted in DNMT3B upregulation and RBL2 downregulation in cultured cells. Our data suggest a potential oncogenic re-engagement of an early developmental program in ETMR via epigenetic alteration mediated by an embryonic, brain-specific DNMT3B isoform.


September 22, 2019  |  

Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina-and SMRT-based RNA-seq datasets

The genome of the wild diploid strawberry species Fragaria vesca, an ideal model system of cultivated strawberry (Fragaria × ananassa, octoploid) and other Rosaceae family crops, was first published in 2011 and followed by a new assembly (Fvb). However, the annotation for Fvb mainly relied on ab initio predictions and included only predicted coding sequences, therefore an improved annotation is highly desirable. Here, a new annotation version named v2.0.a2 was created for the Fvb genome by a pipeline utilizing one PacBio library, 90 Illumina RNA-seq libraries, and 9 small RNA-seq libraries. Altogether, 18,641 genes (55.6% out of 33,538 genes) were augmented with information on the 5′ and/or 3′ UTRs, 13,168 (39.3%) protein-coding genes were modified or newly identified, and 7,370 genes were found to possess alternative isoforms. In addition, 1,938 long non-coding RNAs, 171 miRNAs, and 51,714 small RNA clusters were integrated into the annotation. This new annotation of F. vesca is substantially improved in both accuracy and integrity of gene predictions, beneficial to the gene functional studies in strawberry and to the comparative genomic analysis of other horticultural crops in Rosaceae family.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.