Menu
September 22, 2019

Isoform sequencing and state-of-art applications for unravelling complexity of plant transcriptomes

Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research.


September 22, 2019

Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA-seq in strawberry.

Alternative splicing (AS) is a key post-transcriptional regulatory mechanism, yet little information is known about its roles in fruit crops. Here, AS was globally analyzed in the wild strawberry Fragaria vesca genome with RNA-seq data derived from different stages of fruit development. The AS landscape was characterized and compared between the single-molecule, real-time (SMRT) and Illumina RNA-seq platform. While SMRT has a lower sequencing depth, it identifies more genes undergoing AS (57.67% of detected multiexon genes) when it is compared with Illumina (33.48%), illustrating the efficacy of SMRT in AS identification. We investigated different modes of AS in the context of fruit development; the percentage of intron retention (IR) is markedly reduced whereas that of alternative acceptor sites (AA) is significantly increased post-fertilization when compared with pre-fertilization. When all the identified transcripts were combined, a total of 66.43% detected multiexon genes in strawberry undergo AS, some of which lead to a gain or loss of conserved domains in the gene products. The work demonstrates that SMRT sequencing is highly powerful in AS discovery and provides a rich data resource for later functional studies of different isoforms. Further, shifting AS modes may contribute to rapid changes of gene expression during fruit set.© 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.


September 22, 2019

Cataloguing over-expressed genes in Epstein Barr Virus immortalized lymphoblastoid cell lines through consensus analysis of PacBio transcriptomes corroborates hypomethylation of chromosome 1

The ability of Epstein Barr Virus (EBV) to transform resting cell B-cells into immortalized lymphoblastoid cell lines (LCL) provides a continuous source of peripheral blood lymphocytes that are used to model conditions in which these lymphocytes play a key role. Here, the PacBio generated transcriptome of three LCLs from a parent-daughter trio (SRAid:SRP036136) provided by a previous study [1] were analyzed using a kmer-based version of YeATS (KEATS). The set of over-expressed genes in these cell lines were determined based on a comparison with the PacBio transcriptome of twenty tissues pro- vided by another study (hOPTRS) [2]. MIR155 long non-coding RNA (MIR155HG), Fc fragment of IgE receptor II (FCER2), T-cell leukemia/lymphoma 1A (TCL1A), and germinal center associated signaling and motility (GCSAM) were genes having the highest expression counts in the three LCLs with no expression in hOPTRS. Other over-expressed genes, having low expression in hOPTRS, were membrane spanning 4-domains A1 (MS4A1) and ribosomal protein S2 pseudogene 55 (RPS2P55). While some of these genes are known to be over-expressed in LCLs, this study provides a comprehensive cataloguing of such genes. A recent work involving a patient with EBV-positive large B-cell lymphoma was “unusually lacking various B-cell markers”, but over-expressing CD30 [3] – a gene ranked 79 among uniquely expressed genes here. Hypomethylation of chromosome 1 observed in EBV immortalized LCLs [4, 5] is also corroborated here by mapping the genes to chromosomes. Extending previous work identifying un-annotated genes [6], 80 genes were identified which are expressed in the three LCLs, not in hOPTRS, and missing in the GENCODE, RefSeq and RefSeqGene databases. KEATS introduces a method of determining expression counts based on a partitioning of the known annotated genes, has runtimes of a few hours on a personal workstation and provides detailed reports enabling proper debugging.


September 22, 2019

Characterization of the Rosellinia necatrix transcriptome and genes related to pathogenesis by single-molecule mRNA sequencing.

White root rot disease, caused by the pathogen Rosellinia necatrix, is one of the world’s most devastating plant fungal diseases and affects several commercially important species of fruit trees and crops. Recent global outbreaks of R. necatrix and advances in molecular techniques have both increased interest in this pathogen. However, the lack of information regarding the genomic structure and transcriptome of R. necatrix has been a barrier to the progress of functional genomic research and the control of this harmful pathogen. Here, we identified 10,616 novel full-length transcripts from the filamentous hyphal tissue of R. necatrix (KACC 40445 strain) using PacBio single-molecule sequencing technology. After annotation of the unigene sets, we selected 14 cell cycle-related genes, which are likely either positively or negatively involved in hyphal growth by cell cycle control. The expression of the selected genes was further compared between two strains that displayed different growth rates on nutritional media. Furthermore, we predicted pathogen-related effector genes and cell wall-degrading enzymes from the annotated gene sets. These results provide the most comprehensive transcriptomal resources for R. necatrix, and could facilitate functional genomics and further analyses of this important phytopathogen.


September 22, 2019

Comprehensive genomic analysis of malignant pleural mesothelioma identifies recurrent mutations, gene fusions and splicing alterations.

We analyzed transcriptomes (n = 211), whole exomes (n = 99) and targeted exomes (n = 103) from 216 malignant pleural mesothelioma (MPM) tumors. Using RNA-seq data, we identified four distinct molecular subtypes: sarcomatoid, epithelioid, biphasic-epithelioid (biphasic-E) and biphasic-sarcomatoid (biphasic-S). Through exome analysis, we found BAP1, NF2, TP53, SETD2, DDX3X, ULK2, RYR2, CFAP45, SETDB1 and DDX51 to be significantly mutated (q-score = 0.8) in MPMs. We identified recurrent mutations in several genes, including SF3B1 (~2%; 4/216) and TRAF7 (~2%; 5/216). SF3B1-mutant samples showed a splicing profile distinct from that of wild-type tumors. TRAF7 alterations occurred primarily in the WD40 domain and were, except in one case, mutually exclusive with NF2 alterations. We found recurrent gene fusions and splice alterations to be frequent mechanisms for inactivation of NF2, BAP1 and SETD2. Through integrated analyses, we identified alterations in Hippo, mTOR, histone methylation, RNA helicase and p53 signaling pathways in MPMs.


September 22, 2019

Transgenerational attenuation of opioid self-administration as a consequence of adolescent morphine exposure.

The United States is in the midst of an opiate epidemic, with abuse of prescription and illegal opioids increasing steadily over the past decade. While it is clear that there is a genetic component to opioid addiction, there is a significant portion of heritability that cannot be explained by genetics alone. The current study was designed to test the hypothesis that maternal exposure to opioids prior to pregnancy alters abuse liability in subsequent generations. Female adolescent Sprague Dawley rats were administered morphine at increasing doses (5-25 mg/kg, s.c.) or saline for 10 days (P30-39). During adulthood, animals were bred with drug-naïve colony males. Male and female adult offspring (F1 animals) were tested for morphine self-administration acquisition, progressive ratio, extinction, and reinstatement at three doses of morphine (0.25, 0.75, 1.25 mg/kg/infusion). Grand offspring (F2 animals, from the maternal line) were also examined. Additionally, gene expression changes within the nucleus accumbens were examined with RNA deep sequencing (PacBio) and qPCR. There were dose- and sex-dependent effects on all phases of the self-administration paradigm that indicate decreased morphine reinforcement and attenuated relapse-like behavior. Additionally, genes related to synaptic plasticity, as well as myelin basic protein (MBP), were dysregulated. Some, but not all, effects persisted into the subsequent (F2) generation. The results demonstrate that even limited opioid exposure during adolescence can have lasting effects across multiple generations, which has implications for mechanisms of the transmission of drug abuse liability in humans. Copyright © 2016 Elsevier Ltd. All rights reserved.


September 22, 2019

JAFFA: High sensitivity transcriptome-focused fusion gene detection.

Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimised for short reads. JAFFA (https://github.com/Oshlack/JAFFA/wiki) is a sensitive fusion detection method that outperforms other methods with reads of 100 bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.


September 22, 2019

Transcript profiling of a bitter variety of narrow-leafed lupin to discover alkaloid biosynthetic genes.

Lupins (Lupinus spp.) are nitrogen-fixing legumes that accumulate toxic alkaloids in their protein-rich beans. These anti-nutritional compounds belong to the family of quinolizidine alkaloids (QAs), which are of interest to the pharmaceutical and chemical industries. To unleash the potential of lupins as protein crops and as sources of QAs, a thorough understanding of the QA pathway is needed. However, only the first enzyme in the pathway, lysine decarboxylase (LDC), is known. Here, we report the transcriptome of a high-QA variety of narrow-leafed lupin (L. angustifolius), obtained using eight different tissues and two different sequencing technologies. In addition, we present a list of 33 genes that are closely co-expressed with LDC and that represent strong candidates for involvement in lupin alkaloid biosynthesis. One of these genes encodes a copper amine oxidase able to convert the product of LDC, cadaverine, into 1-piperideine, as shown by heterologous expression and enzyme assays. Kinetic analysis revealed a low KM value for cadaverine, supporting a role as the second enzyme in the QA pathway. Our transcriptomic data set represents a crucial step towards the discovery of enzymes, transporters, and regulators involved in lupin alkaloid biosynthesis.© The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.


September 22, 2019

Neural circular RNAs are derived from synaptic genes and regulated by development and plasticity.

Circular RNAs (circRNAs) have re-emerged as an interesting RNA species. Using deep RNA profiling in different mouse tissues, we observed that circRNAs were substantially enriched in brain and a disproportionate fraction of them were derived from host genes that encode synaptic proteins. Moreover, on the basis of separate profiling of the RNAs localized in neuronal cell bodies and neuropil, circRNAs were, on average, more enriched in the neuropil than their host gene mRNA isoforms. Using high-resolution in situ hybridization, we visualized circRNA punctae in the dendrites of neurons. Consistent with the idea that circRNAs might regulate synaptic function during development, many circRNAs changed their abundance abruptly at a time corresponding to synaptogenesis. In addition, following a homeostatic downscaling of neuronal activity many circRNAs exhibited substantial up- or downregulation. Together, our data indicate that brain circRNAs are positioned to respond to and regulate synaptic function.


September 22, 2019

The industrial melanism mutation in British peppered moths is a transposable element.

Discovering the mutational events that fuel adaptation to environmental change remains an important challenge for evolutionary biology. The classroom example of a visible evolutionary response is industrial melanism in the peppered moth (Biston betularia): the replacement, during the Industrial Revolution, of the common pale typica form by a previously unknown black (carbonaria) form, driven by the interaction between bird predation and coal pollution. The carbonaria locus has been coarsely localized to a 200-kilobase region, but the specific identity and nature of the sequence difference controlling the carbonaria-typica polymorphism, and the gene it influences, are unknown. Here we show that the mutation event giving rise to industrial melanism in Britain was the insertion of a large, tandemly repeated, transposable element into the first intron of the gene cortex. Statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred around 1819, consistent with the historical record. We have begun to dissect the mode of action of the carbonaria transposable element by showing that it increases the abundance of a cortex transcript, the protein product of which plays an important role in cell-cycle regulation, during early wing disc development. Our findings fill a substantial knowledge gap in the iconic example of microevolutionary change, adding a further layer of insight into the mechanism of adaptation in response to natural selection. The discovery that the mutation itself is a transposable element will stimulate further debate about the importance of ‘jumping genes’ as a source of major phenotypic novelty.


September 22, 2019

A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome.

The initiating nucleotide found at the 5′ end of primary transcripts has a distinctive triphosphorylated end that distinguishes these transcripts from all other RNA species. Recognizing this distinction is key to deconvoluting the primary transcriptome from the plethora of processed transcripts that confound analysis of the transcriptome. The currently available methods do not use targeted enrichment for the 5’end of primary transcripts, but rather attempt to deplete non-targeted RNA.We developed a method, Cappable-seq, for directly enriching for the 5′ end of primary transcripts and enabling determination of transcription start sites at single base resolution. This is achieved by enzymatically modifying the 5′ triphosphorylated end of RNA with a selectable tag. We first applied Cappable-seq to E. coli, achieving up to 50 fold enrichment of primary transcripts and identifying an unprecedented 16539 transcription start sites (TSS) genome-wide at single base resolution. We also applied Cappable-seq to a mouse cecum sample and identified TSS in a microbiome.Cappable-seq allows for the first time the capture of the 5′ end of primary transcripts. This enables a unique robust TSS determination in bacteria and microbiomes.  In addition to and beyond TSS determination, Cappable-seq depletes ribosomal RNA and reduces the complexity of the transcriptome to a single quantifiable tag per transcript enabling digital profiling of gene expression in any microbiome.


September 22, 2019

The new world of isoform sequencing

Not too long ago, the life sciences community was still debating whether sequencers would ever overtake microarrays as the preferred means of measuring gene expression. Today, not only have sequencers become the standard workhorse for gene expression studies, but newer sequencing technology has delivered the ability to generate novel expression data even in the most well-characterized cells or organisms. Truly, it is a remarkable time for comprehensive studies of which genes are being transcribed, with the goal of providing functional insight into various biological processes. The key advantage sequencing holds over microarrays is its ability to deeply survey an entire transcriptome, while microarrays are limited to interrogating known genes using probes designed from a reference genome assembly. As next-generation sequencing became more affordable, scientists were eager to switch to this approach, which became known as RNA sequencing or simply RNA-seq. © Mary Ann Liebert, Inc.


September 22, 2019

MHC class I diversity of olive baboons (Papio anubis) unravelled by next-generation sequencing.

The olive baboon represents an important model system to study various aspects of human biology and health, including the origin and diversity of the major histocompatibility complex. After screening of a group of related animals for polymorphisms associated with a well-defined microsatellite marker, subsequent MHC class I typing of a selected population of 24 animals was performed on two distinct next-generation sequencing (NGS) platforms. A substantial number of 21 A and 80 B transcripts were discovered, about half of which had not been previously reported. Per animal, from one to four highly transcribed A alleles (majors) were observed, in addition to ones characterised by low transcripion levels (minors), such as members of the A*14 lineage. Furthermore, in one animal, up to 13 B alleles with differential transcription level profiles may be present. Based on segregation profiles, 16 Paan-AB haplotypes were defined. A haplotype encodes in general one or two major A and three to seven B transcripts, respectively. A further peculiarity is the presence of at least one copy of a B*02 lineage on nearly every haplotype, which indicates that B*02 represents a separate locus with probably a specialistic function. Haplotypes appear to be generated by recombination-like events, and the breakpoints map not only between the A and B regions but also within the B region itself. Therefore, the genetic makeup of the olive baboon MHC class I region appears to have been subject to a similar or even more complex expansion process than the one documented for macaque species.


September 22, 2019

Dual platform long-read RNA-sequencing dataset of the human Cytomegalovirus Lytic transcriptome

RNA-sequencing has revolutionized transcriptomics and the way we measure gene expression (Wang et al., 2009). As of today, short-read RNA sequencing is more widely used, and due to its low price and high throughput, is the preferred tool for the quantitative analysis of gene expression. However, the annotation of transcript isoforms is rather difficult using only short-read sequencing data, because the reads are shorter than most transcripts (Steijger et al., 2013). Long-read sequencing, on the other hand, can provide full contig information about transcripts, including exon-connectivity, and its merits in transcriptome profiling are being increasingly acknowledged (Sharon et al., 2013; Abdel-Ghany et al., 2016; Wang et al., 2016; Kuo et al., 2017). Due to the relatively low throughput of current long-read sequencing technologies, they can only characterize smaller transcriptomes in high-depth (Weirather et al., 2017). The Human cytomegalovirus (HCMV) is a ubiquitous betaherpesvirus, which can cause mononucleosis-like symptoms in adults (Cohen and Corey, 1985), and severe life-threatening infections in newborns (Wen et al., 2002). Latent HCMV infection has recently been implicated to affect cancer formation (Dziurzynski et al., 2012; Jin et al., 2014). Examining the transcriptome of the virus can go a long way in helping understand its molecular biology. Short-read RNA sequencing studies have discovered splice junctions and non-coding transcripts (Gatherer et al., 2011) and have shown that the most abundant HCMV transcripts are similarly expressed in different cell types (Cheng et al., 2017). Our long-read RNA sequencing experiments using the Pacific Biosciences (PacBio) RSII platform revealed a great number of transcript isoforms, polycistronic RNAs and transcriptional overlaps (Balázs et al., 2017a).


September 22, 2019

Computational analysis of alternative splicing in plant genomes.

Computational analyses play crucial roles in characterizing splicing isoforms in plant genomes. In this review, we provide a survey of computational tools used in recently published, genome-scale splicing analyses in plants. We summarize the commonly used software and pipelines for read mapping, isoform reconstruction, isoform quantification, and differential expression analysis. We also discuss methods for analyzing long reads and the strategies to combine long and short reads in identifying splicing isoforms. We review several tools for characterizing local splicing events, splicing graphs, coding potential, and visualizing splicing isoforms. We further discuss the procedures for identifying conserved splicing isoforms across plant species. Finally, we discuss the outlook of integrating other genomic data with splicing analyses to identify regulatory mechanisms of AS on genome-wide scale. Copyright © 2018 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.