Menu
September 22, 2019  |  

Shannon: an information-optimal de novo RNA-Seq assembler

De novo assembly of short RNA-Seq reads into transcripts is challenging due to sequence similarities in transcriptomes arising from gene duplications and alternative splicing of transcripts. We present Shannon, an RNA-Seq assembler with an optimality guarantee derived from principles of information theory: Shannon reconstructs nearly all information-theoretically reconstructable transcripts. Shannon is based on a theory we develop for de novo RNA-Seq assembly that reveals differing abundances among transcripts to be the key, rather than the barrier, to effective assembly. The assembly problem is formulated as a sparsest-flow problem on a transcript graph, and the heart of Shannon is a novel iterative flow-decomposition algorithm. This algorithm provably solves the information-theoretically reconstructable instances in linear-time even though the general sparsest-flow problem is NP-hard. Shannon also incorporates several additional new algorithmic advances: a new error-correction algorithm based on successive cancelation, a multi-bridging algorithm that carefully utilizes read information in the k-mer de Bruijn graph, and an approximate graph partitioning algorithm to split the transcriptome de Bruijn graph into smaller components. In tests on large RNA-Seq datasets, Shannon obtains significant increases in sensitivity along with improvements in specificity in comparison to state-of-the-art assemblers.


September 22, 2019  |  

LSCplus: a fast solution for improving long read accuracy by short read alignment.

The single molecule, real time (SMRT) sequencing technology of Pacific Biosciences enables the acquisition of transcripts from end to end due to its ability to produce extraordinarily long reads (>10 kb). This new method of transcriptome sequencing has been applied to several projects on humans and model organisms. However, the raw data from SMRT sequencing are of relatively low quality, with a random error rate of approximately 15 %, for which error correction using next-generation sequencing (NGS) short reads is typically necessary. Few tools have been designed that apply a hybrid sequencing approach that combines NGS and SMRT data, and the most popular existing tool for error correction, LSC, has computing resource requirements that are too intensive for most laboratory and research groups. These shortcomings severely limit the application of SMRT long reads for transcriptome analysis.Here, we report an improved tool (LSCplus) for error correction with the LSC program as a reference. LSCplus overcomes the disadvantage of LSC’s time consumption and improves quality. Only 1/3-1/4 of the time and 1/20-1/25 of the error correction time is required using LSCplus compared with that required for using LSC.LSCplus is freely available at http://www.herbbol.org:8001/lscplus/ . Sample calculations are provided illustrating the precision and efficiency of this method regarding error correction and isoform detection.


September 22, 2019  |  

Tracking alternatively spliced isoforms from long reads by SpliceHunter.

Alternative splicing increases the functional complexity of a genome by generating multiple isoforms and potentially proteins from the same gene. Vast amounts of alternative splicing events are routinely detected by short read deep sequencing technologies but their functional interpretation is hampered by an uncertain transcript context. Emerging long-read sequencing technologies provide a more complete picture of full-length transcript sequences. We introduce SpliceHunter, a tool for the computational interpretation of long reads generated by for example Pacific Biosciences instruments. SpliceHunter defines and tracks isoforms and novel transcription units across time points, compares their splicing pattern to a reference annotation, and translates them into potential protein sequences.


September 22, 2019  |  

Transcriptome characterization of moso bamboo (Phyllostachys edulis) seedlings in response to exogenous gibberellin applications.

Moso bamboo (Phyllostachys edulis) is a well-known bamboo species of high economic value in the textile industry due to its rapid growth. Phytohormones, which are master regulators of growth and development, serve as important endogenous signals. However, the mechanisms through which phytohormones regulate growth in moso bamboo remain unknown to date.Here, we reported that exogenous gibberellins (GA) applications resulted in a significantly increased internode length and lignin condensation. Transcriptome sequencing revealed that photosynthesis-related genes were enriched in the GA-repressed gene class, which was consistent with the decrease in leaf chlorophyll concentrations and the lower rate of photosynthesis following GA treatment. Exogenous GA applications on seedlings are relatively easy to perform, thus we used 4-week-old whole seedlings of bamboo for GA- treatment followed by high throughput sequencing. In this study, we identified 932 cis-nature antisense transcripts (cis-NATs), and 22,196 alternative splicing (AS) events in total. Among them, 42 cis-nature antisense transcripts (cis-NATs) and 442 AS events were differentially expressed upon exposure to exogenous GA3, suggesting that post-transcriptional regulation might be also involved in the GA3 response. Targets of differential expression of cis-NATs included genes involved in hormone receptor, photosynthesis and cell wall biogenesis. For example, LAC4 and its corresponding cis-NATs were GA3-induced, and may be involved in the accumulation of lignin, thus affecting cell wall composition.This study provides novel insights illustrating how GA alters post-transcriptional regulation and will shed light on the underlying mechanism of growth modulated by GA in moso bamboo.


September 22, 2019  |  

Global identification of the full-length transcripts and alternative splicing related to phenolic acid biosynthetic genes in Salvia miltiorrhiza.

Salvianolic acids are among the main bioactive components in Salvia miltiorrhiza, and their biosynthesis has attracted widespread interest. However, previous studies on the biosynthesis of phenolic acids using next-generation sequencing platforms are limited with regard to the assembly of full-length transcripts. Based on hybrid-seq (next-generation and single molecular real-time sequencing) of the S. miltiorrhiza root transcriptome, we experimentally identified 15 full-length transcripts and four alternative splicing events of enzyme-coding genes involved in the biosynthesis of rosmarinic acid. Moreover, we herein demonstrate that lithospermic acid B accumulates in the phloem and xylem of roots, in agreement with the expression patterns of the identified key genes related to rosmarinic acid biosynthesis. According to co-expression patterns, we predicted that six candidate cytochrome P450s and five candidate laccases participate in the salvianolic acid pathway. Our results provide a valuable resource for further investigation into the synthetic biology of phenolic acids in S. miltiorrhiza.


September 22, 2019  |  

Single-cell RNAseq for the study of isoforms-how is that possible?

Single-cell RNAseq and alternative splicing studies have recently become two of the most prominent applications of RNAseq. However, the combination of both is still challenging, and few research efforts have been dedicated to the intersection between them. Cell-level insight on isoform expression is required to fully understand the biology of alternative splicing, but it is still an open question to what extent isoform expression analysis at the single-cell level is actually feasible. Here, we establish a set of four conditions that are required for a successful single-cell-level isoform study and evaluate how these conditions are met by these technologies in published research.


September 22, 2019  |  

Long-read sequencing of human cytomegalovirus transcriptome reveals RNA isoforms carrying distinct coding potentials.

The human cytomegalovirus (HCMV) is a ubiquitous, human pathogenic herpesvirus. The complete viral genome is transcriptionally active during infection; however, a large part of its transcriptome has yet to be annotated. In this work, we applied the amplified isoform sequencing technique from Pacific Biosciences to characterize the lytic transcriptome of HCMV strain Towne varS. We developed a pipeline for transcript annotation using long-read sequencing data. We identified 248 transcriptional start sites, 116 transcriptional termination sites and 80 splicing events. Using this information, we have annotated 291 previously undescribed or only partially annotated transcript isoforms, including eight novel antisense transcripts and their isoforms, as well as a novel transcript (RS2) in the short repeat region, partially antisense to RS1. Similarly to other organisms, we discovered a high transcriptional diversity in HCMV, with many transcripts only slightly differing from one another. Comparing our transcriptome profiling results to an earlier ribosome footprint analysis, we have concluded that the majority of the transcripts contain multiple translationally active ORFs, and also that most isoforms contain unique combinations of ORFs. Based on these results, we propose that one important function of this transcriptional diversity may be to provide a regulatory mechanism at the level of translation.


September 22, 2019  |  

High resolution annotation of zebrafish transcriptome using long-read sequencing.

With the emergence of zebrafish as an important model organism, a concerted effort has been made to study its transcriptome. This effort is limited, however, by gaps in zebrafish annotation, which are especially pronounced concerning transcripts dynamically expressed during zygotic genome activation (ZGA). To date, short-read sequencing has been the principal technology for zebrafish transcriptome annotation. In part because these sequence reads are too short for assembly methods to resolve the full complexity of the transcriptome, the current annotation is rudimentary. By providing direct observation of full-length transcripts, recently refined long-read sequencing platforms can dramatically improve annotation coverage and accuracy. Here, we leveraged the SMRT platform to study the transcriptome of zebrafish embryos before and after ZGA. Our analysis revealed additional novelty and complexity in thehttps://www.ncbi.nlm.nih.gov/pubmed/nfidence novel transcripts that originated from previously unannotated loci and 1835 high-confidence new isoforms in previously annotated genes. We validated these findings using a suite of computational approaches including structural prediction, sequence homology, and functional conservation analyses, as well as by confirmatory transcript quantification with short-read sequencing data. Our analyses provided insight into new homologs and paralogs of functionally important proteins and noncoding RNAs, isoform switching occurrences, and different classes of novel splicing events. Several novel isoforms representing distinct splicing events were validated through PCR experiments, including the discovery and validation of a novel 8-kb transcript spanning multiple mir-430 elements, an important driver of early development. Our study provides a significantly improved zebrafish transcriptome annotation resource.© 2018 Nudelman et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Single molecule RNA sequencing uncovers trans-splicing and improves annotations in Anopheles stephensi.

Single molecule real-time (SMRT) sequencing has recently been used to obtain full-length cDNA sequences that improve genome annotation and reveal RNA isoforms. Here, we used one such method called isoform sequencing from Pacific Biosciences (PacBio) to sequence a cDNA library from the Asian malaria mosquito Anopheles stephensi. More than 600 000 full-length cDNAs, referred to as reads of insert, were identified. Owing to the inherently high error rate of PacBio sequencing, we tested different approaches for error correction. We found that error correction using Illumina RNA sequencing (RNA-seq) generated more data than using the default SMRT pipeline. The full-length error-corrected PacBio reads greatly improved the gene annotation of Anopheles stephensi: 4867 gene models were updated and 1785 alternatively spliced isoforms were added to the annotation. In addition, six trans-splicing events, where exons from different primary transcripts were joined together, were identified in An. stephensi. All six trans-splicing events appear to be conserved in Culicidae, as they are also found in Anopheles gambiae and Aedes aegypti. The proteins encoded by trans-splicing events are also highly conserved and the orthologues of these proteins are cis-spliced in outgroup species, indicating that trans-splicing may arise as a mechanism to rescue genes that broke up during evolution.© 2017 The Royal Entomological Society.


September 22, 2019  |  

The hardy rubber tree genome provides insights into the evolution of polyisoprene biosynthesis.

Eucommia ulmoides, also called hardy rubber tree, is an economically important tree; however, the lack of its genome sequence restricts the fundamental biological research and applied studies of this plant species. Here, we present a high-quality assembly of its ~1.2-Gb genome (scaffold N50 = 1.88 Mb) with at least 26 723 predicted genes for E. ulmoides, the first sequenced genome of the order Garryales, which was obtained using an integrated strategy combining Illumina sequencing, PacBio sequencing, and BioNano mapping. As a sister taxon to lamiids and campanulids, E. ulmoides underwent an ancient genome triplication shared by core eudicots but no further whole-genome duplication in the last ~125 million years. E. ulmoides exhibits high expression levels and/or gene number expansion for multiple genes involved in stress responses and the biosynthesis of secondary metabolites, which may account for its considerable environmental adaptability. In contrast to the rubber tree (Hevea brasiliensis), which produces cis-polyisoprene, E. ulmoides has evolved to synthesize long-chain trans-polyisoprene via farnesyl diphosphate synthases (FPSs). Moreover, FPS and rubber elongation factor/small rubber particle protein gene families were expanded independently from the H. brasiliensis lineage. These results provide new insights into the biology of E. ulmoides and the origin of polyisoprene biosynthesis. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.


September 22, 2019  |  

Genetic and molecular basis of the immune system in the brachiopod Lingula anatina.

The extension of comparative immunology to non-model systems, such as mollusks and annelids, has revealed an unexpected diversity in the complement of immune receptors and effectors among evolutionary lineages. However, several lophotrochozoan phyla remain unexplored mainly due to the lack of genomic resources. The increasing accessibility of high-throughput sequencing technologies offers unique opportunities for extending genome-wide studies to non-model systems. As a result, the genome-based study of the immune system in brachiopods allows a better understanding of the alternative survival strategies developed by these immunologically neglected phyla. Here we present a detailed overview of the molecular components of the immune system identified in the genome of the brachiopod Lingula anatina. Our findings reveal conserved intracellular signaling pathways as well as unique strategies for pathogen detection and killing in brachiopods. Copyright © 2017 Elsevier Ltd. All rights reserved.


September 22, 2019  |  

A gene-rich fraction analysis of the Passiflora edulis genome reveals highly conserved microsyntenic regions with two related Malpighiales species.

Passiflora edulis is the most widely cultivated species of passionflowers, cropped mainly for industrialized juice production and fresh fruit consumption. Despite its commercial importance, little is known about the genome structure of P. edulis. To fill in this gap in our knowledge, a genomic library was built, and now completely sequenced over 100 large-inserts. Sequencing data were assembled from long sequence reads, and structural sequence annotation resulted in the prediction of about 1,900 genes, providing data for subsequent functional analysis. The richness of repetitive elements was also evaluated. Microsyntenic regions of P. edulis common to Populus trichocarpa and Manihot esculenta, two related Malpighiales species with available fully sequenced genomes were examined. Overall, gene order was well conserved, with some disruptions of collinearity identified as rearrangements, such as inversion and translocation events. The microsynteny level observed between the P. edulis sequences and the compared genomes is surprising, given the long divergence time that separates them from the common ancestor. P. edulis gene-rich segments are more compact than those of the other two species, even though its genome is much larger. This study provides a first accurate gene set for P. edulis, opening the way for new studies on the evolutionary issues in Malpighiales genomes.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.