Menu
September 22, 2019  |  

Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.

The large and complex hexaploid genome has greatly hindered genomics studies of common wheat (Triticum aestivum, AABBDD). Here, we investigated transcripts in common wheat developing caryopses using the emerging single-molecule real-time (SMRT) sequencing technology PacBio RSII, and assessed the resultant data for improving common wheat genome annotation and grain transcriptome research.We obtained 197,709 full-length non-chimeric (FLNC) reads, 74.6 % of which were estimated to carry complete open reading frame. A total of 91,881 high-quality FLNC reads were identified and mapped to 16,188 chromosomal loci, corresponding to 13,162 known genes and 3026 new genes not annotated previously. Although some FLNC reads could not be unambiguously mapped to the current draft genome sequence, many of them are likely useful for studying highly similar homoeologous or paralogous loci or for improving chromosomal contig assembly in further research. The 91,881 high-quality FLNC reads represented 22,768 unique transcripts, 9591 of which were newly discovered. We found 180 transcripts each spanning two or three previously annotated adjacent loci, suggesting that they should be merged to form correct gene models. Finally, our data facilitated the identification of 6030 genes differentially regulated during caryopsis development, and full-length transcripts for 72 transcribed gluten gene members that are important for the end-use quality control of common wheat.Our work demonstrated the value of PacBio transcript sequencing for improving common wheat genome annotation through uncovering the loci and full-length transcripts not discovered previously. The resource obtained may aid further structural genomics and grain transcriptome studies of common wheat.


September 22, 2019  |  

Quantitative isoform-profiling of highly diversified recognition molecules.

Complex biological systems rely on cell surface cues that govern cellular self-recognition and selective interactions with appropriate partners. Molecular diversification of cell surface recognition molecules through DNA recombination and complex alternative splicing has emerged as an important principle for encoding such interactions. However, the lack of tools to specifically detect and quantify receptor protein isoforms is a major impediment to functional studies. We here developed a workflow for targeted mass spectrometry by selected reaction monitoring (SRM) that permits quantitative assessment of highly diversified protein families. We apply this workflow to dissecting the molecular diversity of the neuronal neurexin receptors and uncover an alternative splicing-dependent recognition code for synaptic ligands.


September 22, 2019  |  

Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing.

Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ~70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. In other cases, the identified transcripts have improved existing gene models. Averaging across all six tissues, 90% of the splice junctions are supported by short reads from matched tissues. In addition, we identified a large number of novel long non-coding RNAs and fusion transcripts and found that DNA methylation plays an important role in generating various isoforms. Our results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought.


September 22, 2019  |  

Generation and comparative analysis of full-length transcriptomes in sweetpotato and its putative wild ancestor I. trifida.

Sweetpotato [Ipomoea batatas (L.) Lam.] is one of the most important crops in many developing countries and provides a candidate source of bioenergy. However, neither high-quality reference genome nor large-scale full-length cDNA sequences for this outcrossing hexaploid are still lacking, which in turn impedes progress in research studies in sweetpotato functional genomics and molecular breeding. In this study, we apply a combination of second- and third-generation sequencing technologies to sequence full-length transcriptomes in sweetpotato and its putative ancestor I. trifida. In total, we obtained 53,861/51,184 high-quality transcripts, which includes 34,963/33,637 putative full-length cDNA sequences, from sweetpotato/I. trifida. Amongst, we identified 104,540/94,174 open reading frames, 1476/1475 transcription factors, 25,315/27,090 simple sequence repeats, 417/531 long non-coding RNAs out of the sweetpotato/I. trifida dataset. By utilizing public available genomic contigs, we analyzed the gene features (including exon number, exon size, intron number, intron size, exon-intron structure) of 33,119 and 32,793 full-length transcripts in sweetpotato and I. trifida, respectively. Furthermore, comparative analysis between our transcript datasets and other large-scale cDNA datasets from different plant species enables us assessing the quality of public datasets, estimating the genetic similarity across relative species, and surveyed the evolutionary pattern of genes. Overall, our study provided fundamental resources of large-scale full-length transcripts in sweetpotato and its putative ancestor, for the first time, and would facilitate structural, functional and comparative genomics studies in this important crop.


September 22, 2019  |  

Isoform evolution in primates through independent combination of alternative RNA processing events.

Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875?bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.