A new review article offers a nice overview of attempts to characterize the transcriptome of human stem cells using RNA-seq, the Iso-Seq™ method, and more. Kin Fai Au and Vittorio Sebastiano, scientists at the University of Iowa and Stanford University, respectively, contributed the review to Current Opinion in Genetics & Development.
“The introduction of the RNA-Seq technology based on [second-generation sequencing technology] has provided a remarkable step forward providing a fast and inexpensive way to determine the transcriptome of a given cell type and several remarkable works have been done using this type of approach,” Au and Sebastiano write. “Nonetheless tasks like de novo discovery of genes, gene isoforms assembly or transcript and isoform abundance determination are still challenging and far from being achieved.”
They report on a previous paper from Au in which Single Molecule, Real-Time (SMRT®) Sequencing was combined with short-read sequencing to detect isoforms in a well-characterized human embryonic stem cell line. Long reads led to the detection of hundreds of novel isoforms and long noncoding RNAs. Long intergenic noncoding RNAs (lincRNAs) are a topic of interest in the review article, where Au and Sebastiano note that they “have a very high degree of repetitive elements and it is therefore extremely challenging to determine the correct gene annotation and the abundance due to the difficulties in aligning short read data to the genome.” With long-read sequencing, they add, sequence data spans unique sections of the lincRNAs and makes it possible to accurately map reads to the correct region.
The authors cite recent studies demonstrating more transcriptional activity in the human genome than has been expected. “Transcription occurs across 80–90% of the human genome, in contrast with the assumption that only 3% (or less) of the genome is actually coding for proteins,” they write. LincRNAs and other noncoding RNAs may explain the difference between those numbers.
Au and Sebastiano call for more studies of stem cells using long-read sequencing technology to establish a better view of the transcriptional activity in these important cells with accurate detection of noncoding RNAs characterized by highly repetitive sequence. “Given such complexity of the epigenetic status for most of the genes, it is essential to identify the transcripts and the isoforms that are indeed functionally relevant (even if expressed at low levels) in [pluripotent stem cells],” they write.