April 21, 2020  

RNA sequencing: the teenage years.

Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.

April 21, 2020  

Full-length mRNA sequencing and gene expression profiling reveal broad involvement of natural antisense transcript gene pairs in pepper development and response to stresses.

Pepper is an important vegetable with great economic value and unique biological features. In the past few years, significant development has been made towards understanding the huge complex pepper genome; however, pepper functional genomics has not been well studied. To better understand the pepper gene structure and pepper gene regulation, we conducted full-length mRNA sequencing by PacBio sequencing and obtained 57862 high-quality full-length mRNA sequences derived from 18362 previously annotated and 5769 newly detected genes. New gene models were built that combined the full-length mRNA sequences and corrected approximately 500 fragmented gene models from previous annotations. Based on the full-length mRNA, we identified 4114 and 5880 pepper genes forming natural antisense transcript (NAT) genes in-cis and in-trans, respectively. Most of these genes accumulate small RNAs in their overlapping regions. By analyzing these NAT gene expression patterns in our transcriptome data, we identified many NAT pairs responsive to a variety of biological processes in pepper. Pepper formate dehydrogenase 1 (FDH1), which is required for R-gene-mediated disease resistance, may be regulated by nat-siRNAs and participate in a positive feedback loop in salicylic acid biosynthesis during resistance responses. Several cis-NAT pairs and subgroups of trans-NAT genes were responsive to pepper pericarp and placenta development, which may play roles in capsanthin and capsaicin biosynthesis. Using a comparative genomics approach, the evolutionary mechanisms of cis-NATs were investigated, and we found that an increase in intergenic sequences accounted for the loss of most cis-NATs, while transposon insertion contributed to the formation of most new cis-NATs. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.

April 21, 2020  

FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control.

Although messenger RNAs are key molecules for understanding life, until now, no method has existed to determine the full-length sequence of endogenous mRNAs including their poly(A) tails. Moreover, although non-A nucleotides can be incorporated in poly(A) tails, there also exists no method to accurately sequence them. Here, we present full-length poly(A) and mRNA sequencing (FLAM-seq), a rapid and simple method for high-quality sequencing of entire mRNAs. We report a complementary DNA library preparation method coupled to single-molecule sequencing to perform FLAM-seq. Using human cell lines, brain organoids and Caenorhabditis elegans we show that FLAM-seq delivers high-quality full-length mRNA sequences for thousands of different genes per sample. We find that 3′ untranslated region length is correlated with poly(A) tail length, that alternative polyadenylation sites and alternative promoters for the same gene are linked to different tail lengths, and that tails contain a substantial number of cytosines.

April 21, 2020  

Full-length mRNA sequencing in Saccharina japonica and identification of carbonic anhydrase genes

The carbonic anhydrases (CAs) are a group of enzymes that play an important role in the absorption and transportation of CO2 in Saccharina japonica. They are encoded by a superfamily of genes with seven subtypes that are unrelated in sequence but share conserved function in catalyzing the reversible conversion of CO2 and HCO3-. Here we have characterized the CA members in the transcriptome of S. japonica using Single-molecule real-time (SMRT) sequencing technology. Approximately 9830.4 megabases from 5,028,003 quality subreads were generated, and they were assembled into 326,512 full-length non-chimeric (FLNC) reads, with an average flnc read length of 2181 bp. After removing redundant sequences, 79,010 unique transcripts were obtained of which 38,039 transcripts were successfully annotated. From the full-length transcriptome, we have identified 7 full-length cDNA sequences for CA genes (4 a-CAs, 1 ß-CAs and 2 ?-CAs) and assessed for their potential functions based on phylogenetic analysis. Characterizations of CAs will provide the ground for future studies to determine the involvement of CAs in inorganic carbon absorption and transportation in S. japonica.

April 21, 2020  

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally shorter transcripts were detected by normalization. For the same input cDNA and the same data yield, the normalized library recovered more total transcript isoforms, number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ~1.25 kb, more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ~52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ~80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.

April 21, 2020  

Long-Read Sequencing Emerging in Medical Genetics

The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.

