Menu
September 22, 2019

Single-Molecule Long-Read Sequencing of Zanthoxylum bungeanum Maxim. Transcriptome: Identification of Aroma-Related Genes

Zanthoxylum bungeanum Maxim. is an economically important tree species that is resistant to drought and infertility, and has potential medicinal and edible value. However, comprehensive genomic data are not yet available for this species, limiting its potential utility for medicinal use, breeding programs, and cultivation. Transcriptome sequencing provides an effective approach to remedying this shortcoming. Herein, single-molecule long-read sequencing and next-generation sequencingapproacheswereusedinparalleltoobtaintranscriptisoformstructureandgenefunctional informationinZ.bungeanum. Intotal, 282,101readsofinserts(ROIs)wereidentified, including134,074 full-length non-chimeric reads, among which 65,711 open reading frames (ORFs), 50,135 simple sequence repeats (SSRs), and 1492 long non-coding RNAs (lncRNAs) were detected. Functional annotation revealed metabolic pathways related to aroma components and color characteristics in Z. bungeanum. Unexpectedly, 30 transcripts were annotated as genes involved in regulating the pathogenesis of breast and colorectal cancers. This work provides a comprehensive transcriptome resource for Z. bungeanum, and lays a foundation for the further investigation and utilization of Zanthoxylum resources.


September 22, 2019

Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA-seq in strawberry.

Alternative splicing (AS) is a key post-transcriptional regulatory mechanism, yet little information is known about its roles in fruit crops. Here, AS was globally analyzed in the wild strawberry Fragaria vesca genome with RNA-seq data derived from different stages of fruit development. The AS landscape was characterized and compared between the single-molecule, real-time (SMRT) and Illumina RNA-seq platform. While SMRT has a lower sequencing depth, it identifies more genes undergoing AS (57.67% of detected multiexon genes) when it is compared with Illumina (33.48%), illustrating the efficacy of SMRT in AS identification. We investigated different modes of AS in the context of fruit development; the percentage of intron retention (IR) is markedly reduced whereas that of alternative acceptor sites (AA) is significantly increased post-fertilization when compared with pre-fertilization. When all the identified transcripts were combined, a total of 66.43% detected multiexon genes in strawberry undergo AS, some of which lead to a gain or loss of conserved domains in the gene products. The work demonstrates that SMRT sequencing is highly powerful in AS discovery and provides a rich data resource for later functional studies of different isoforms. Further, shifting AS modes may contribute to rapid changes of gene expression during fruit set.© 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.


September 22, 2019

Transcript profiling of a bitter variety of narrow-leafed lupin to discover alkaloid biosynthetic genes.

Lupins (Lupinus spp.) are nitrogen-fixing legumes that accumulate toxic alkaloids in their protein-rich beans. These anti-nutritional compounds belong to the family of quinolizidine alkaloids (QAs), which are of interest to the pharmaceutical and chemical industries. To unleash the potential of lupins as protein crops and as sources of QAs, a thorough understanding of the QA pathway is needed. However, only the first enzyme in the pathway, lysine decarboxylase (LDC), is known. Here, we report the transcriptome of a high-QA variety of narrow-leafed lupin (L. angustifolius), obtained using eight different tissues and two different sequencing technologies. In addition, we present a list of 33 genes that are closely co-expressed with LDC and that represent strong candidates for involvement in lupin alkaloid biosynthesis. One of these genes encodes a copper amine oxidase able to convert the product of LDC, cadaverine, into 1-piperideine, as shown by heterologous expression and enzyme assays. Kinetic analysis revealed a low KM value for cadaverine, supporting a role as the second enzyme in the QA pathway. Our transcriptomic data set represents a crucial step towards the discovery of enzymes, transporters, and regulators involved in lupin alkaloid biosynthesis.© The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.


September 22, 2019

The new world of isoform sequencing

Not too long ago, the life sciences community was still debating whether sequencers would ever overtake microarrays as the preferred means of measuring gene expression. Today, not only have sequencers become the standard workhorse for gene expression studies, but newer sequencing technology has delivered the ability to generate novel expression data even in the most well-characterized cells or organisms. Truly, it is a remarkable time for comprehensive studies of which genes are being transcribed, with the goal of providing functional insight into various biological processes. The key advantage sequencing holds over microarrays is its ability to deeply survey an entire transcriptome, while microarrays are limited to interrogating known genes using probes designed from a reference genome assembly. As next-generation sequencing became more affordable, scientists were eager to switch to this approach, which became known as RNA sequencing or simply RNA-seq. © Mary Ann Liebert, Inc.


September 22, 2019

Computational analysis of alternative splicing in plant genomes.

Computational analyses play crucial roles in characterizing splicing isoforms in plant genomes. In this review, we provide a survey of computational tools used in recently published, genome-scale splicing analyses in plants. We summarize the commonly used software and pipelines for read mapping, isoform reconstruction, isoform quantification, and differential expression analysis. We also discuss methods for analyzing long reads and the strategies to combine long and short reads in identifying splicing isoforms. We review several tools for characterizing local splicing events, splicing graphs, coding potential, and visualizing splicing isoforms. We further discuss the procedures for identifying conserved splicing isoforms across plant species. Finally, we discuss the outlook of integrating other genomic data with splicing analyses to identify regulatory mechanisms of AS on genome-wide scale. Copyright © 2018 Elsevier B.V. All rights reserved.


September 22, 2019

G&T-seq: parallel sequencing of single-cell genomes and transcriptomes.

The simultaneous sequencing of a single cell’s genome and transcriptome offers a powerful means to dissect genetic variation and its effect on gene expression. Here we describe G&T-seq, a method for separating and sequencing genomic DNA and full-length mRNA from single cells. By applying G&T-seq to over 220 single cells from mice and humans, we discovered cellular properties that could not be inferred from DNA or RNA sequencing alone.


September 22, 2019

Integrative analysis of three RNA sequencing methods identifies mutually exclusive exons of MADS-box isoforms during early bud development in Picea abies.

Recent efforts to sequence the genomes and transcriptomes of several gymnosperm species have revealed an increased complexity in certain gene families in gymnosperms as compared to angiosperms. One example of this is the gymnosperm sister clade to angiosperm TM3-like MADS-box genes, which at least in the conifer lineage has expanded in number of genes. We have previously identified a member of this sub-clade, the conifer gene DEFICIENS AGAMOUS LIKE 19 (DAL19), as being specifically upregulated in cone-setting shoots. Here, we show through Sanger sequencing of mRNA-derived cDNA and mapping to assembled conifer genomic sequences that DAL19 produces six mature mRNA splice variants in Picea abies. These splice variants use alternate first and last exons, while their four central exons constitute a core region present in all six transcripts. Thus, they are likely to be transcript isoforms. Quantitative Real-Time PCR revealed that two mutually exclusive first DAL19 exons are differentially expressed across meristems that will form either male or female cones, or vegetative shoots. Furthermore, mRNA in situ hybridization revealed that two mutually exclusive last DAL19 exons were expressed in a cell-specific pattern within bud meristems. Based on these findings in DAL19, we developed a sensitive approach to transcript isoform assembly from short-read sequencing of mRNA. We applied this method to 42 putative MADS-box core regions in P. abies, from which we assembled 1084 putative transcripts. We manually curated these transcripts to arrive at 933 assembled transcript isoforms of 38 putative MADS-box genes. 152 of these isoforms, which we assign to 28 putative MADS-box genes, were differentially expressed across eight female, male, and vegetative buds. We further provide evidence of the expression of 16 out of the 38 putative MADS-box genes by mapping PacBio Iso-Seq circular consensus reads derived from pooled sample sequencing to assembled transcripts. In summary, our analyses reveal the use of mutually exclusive exons of MADS-box gene isoforms during early bud development in P. abies, and we find that the large number of identified MADS-box transcripts in P. abies results not only from expansion of the gene family through gene duplication events but also from the generation of numerous splice variants.


September 22, 2019

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.© 2018 Tardaguila et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing.In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells.Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.


September 22, 2019

Long-read based assembly and annotation of a Drosophila simulans genome

Long-read sequencing technologies enable high-quality, contiguous genome assemblies. Here we used SMRT sequencing to assemble the genome of a Drosophila simulans strain originating from Madagascar, the ancestral range of the species. We generated 8 Gb of raw data (~50x coverage) with a mean read length of 6,410 bp, a NR50 of 9,125 bp and the longest subread at 49 kb. We benchmarked six different assemblers and merged the best two assemblies from Canu and Falcon. Our final assembly was 127.41 Mb with a N50 of 5.38 Mb and 305 contigs. We anchored more than 4 Mb of novel sequence to the major chromosome arms, and significantly improved the assembly of peri-centromeric and telomeric regions. Finally, we performed full-length transcript sequencing and used this data in conjunction with short-read RNAseq data to annotate 13,422 genes in the genome, improving the annotation in regions with complex, nested gene structures.


September 22, 2019

A single-molecule long-read survey of the human transcriptome.

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify ~14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.


September 22, 2019

Comprehensive transcriptome analysis of Sarcophaga peregrina, a forensically important fly species.

Sarcophaga peregrina (flesh fly) is a frequently found fly species in Palaearctic, Oriental, and Australasian regions that can be used to estimate minimal postmortem intervals important for forensic investigations. Despite its forensic importance, the genome information of S. peregrina has not been fully described. Therefore, we generated a comprehensive gene expression dataset using RNA sequencing and carried out de novo assembly to characterize the S. peregrina transcriptome. We obtained precise sequence information for RNA transcripts using two different methods. Based on primary sequence information, we identified sets of assembled unigenes and predicted coding sequences. Functional annotation of the aligned unigenes was performed using the UniProt, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes databases. As a result, 26,580,352 and 83,221 raw reads were obtained using the Illumina MiSeq and Pacbio RS II Iso-Seq sequencing applications, respectively. From these reads, 55,730 contigs were successfully annotated. The present study provides the resulting genome information of S. peregrina, which is valuable for forensic applications.


September 22, 2019

Defining cell identity with single cell omics.

Cells are a fundamental unit of life, and the ability to study the phenotypes and behaviors of individual cells is crucial to understanding the workings of complex biological systems. Cell phenotypes (epigenomic, transcriptomic, proteomic, and metabolomic) exhibit dramatic heterogeneity between and within the different cell types and states underlying cellular functional diversity. Cell genotypes can also display heterogeneity throughout an organism, in the form of somatic genetic variation-most notably in the emergence and evolution of tumors. Recent technical advances in single-cell isolation and the development of omics approaches sensitive enough to reveal these aspects of cell identity have enabled a revolution in the study of multicellular systems. In this review, we discuss the technologies available to resolve the genomes, epigenomes, transcriptomes, proteomes, and metabolomes of single cells from a wide variety of living systems.© 2018 The Authors. Proteomics Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


September 22, 2019

Characterization of the human ESC transcriptome by hybrid sequencing.

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


September 22, 2019

Long-read sequencing of nascent RNA reveals coupling among RNA processing events.

Pre-mRNA splicing is accomplished by the spliceosome, a megadalton complex that assembles de novo on each intron. Because spliceosome assembly and catalysis occur cotranscriptionally, we hypothesized that introns are removed in the order of their transcription in genomes dominated by constitutive splicing. Remarkably little is known about splicing order and the regulatory potential of nascent transcript remodeling by splicing, due to the limitations of existing methods that focus on analysis of mature splicing products (mRNAs) rather than substrates and intermediates. Here, we overcome this obstacle through long-read RNA sequencing of nascent, multi-intron transcripts in the fission yeast Schizosaccharomyces pombe Most multi-intron transcripts were fully spliced, consistent with rapid cotranscriptional splicing. However, an unexpectedly high proportion of transcripts were either fully spliced or fully unspliced, suggesting that splicing of any given intron is dependent on the splicing status of other introns in the transcript. Supporting this, mild inhibition of splicing by a temperature-sensitive mutation in prp2, the homolog of vertebrate U2AF65, increased the frequency of fully unspliced transcripts. Importantly, fully unspliced transcripts displayed transcriptional read-through at the polyA site and were degraded cotranscriptionally by the nuclear exosome. Finally, we show that cellular mRNA levels were reduced in genes with a high number of unspliced nascent transcripts during caffeine treatment, showing regulatory significance of cotranscriptional splicing. Therefore, overall splicing of individual nascent transcripts, 3′ end formation, and mRNA half-life depend on the splicing status of neighboring introns, suggesting crosstalk among spliceosomes and the polyA cleavage machinery during transcription elongation.© 2018 Herzel et al.; Published by Cold Spring Harbor Laboratory Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.