Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.
TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts.
Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants.Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions.All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean.
Hybrid sequencing-based personal full-length transcriptomic analysis implicates proteostatic stress in metastatic ovarian cancer.
Comprehensive molecular characterization of myriad somatic alterations and aberrant gene expressions at personal level is key to precision cancer therapy, yet limited by current short-read sequencing technology, individualized catalog of complete genomic and transcriptomic features is thus far elusive. Here, we integrated second- and third-generation sequencing platforms to generate a multidimensional dataset on a patient affected by metastatic epithelial ovarian cancer. Whole-genome and hybrid transcriptome dissection captured global genetic and transcriptional variants at previously unparalleled resolution. Particularly, single-molecule mRNA sequencing identified a vast array of unannotated transcripts, novel long noncoding RNAs and gene chimeras, permitting accurate determination of transcription start, splice, polyadenylation and fusion sites. Phylogenetic and enrichment inference of isoform-level measurements implicated early functional divergence and cytosolic proteostatic stress in shaping ovarian tumorigenesis. A complementary imaging-based high-throughput drug screen was performed and subsequently validated, which consistently pinpointed proteasome inhibitors as an effective therapeutic regime by inducing protein aggregates in ovarian cancer cells. Therefore, our study suggests that clinical application of the emerging long-read full-length analysis for improving molecular diagnostics is feasible and informative. An in-depth understanding of the tumor transcriptome complexity allowed by leveraging the hybrid sequencing approach lays the basis to reveal novel and valid therapeutic vulnerabilities in advanced ovarian malignancies.
SMRT sequencing reveals differential patterns of methylation in two O111:H- STEC isolates from a hemolytic uremic syndrome outbreak in Australia.
In 1995 a severe haemolytic-uremic syndrome (HUS) outbreak in Adelaide occurred. A recent genomic analysis of Shiga toxigenic Escherichia coli (STEC) O111:H- strains 95JB1 and 95NR1 from this outbreak found that the more virulent isolate, 95NR1, harboured two additional copies of the Shiga toxin 2 (Stx2) genes encoded within prophage regions. The structure of the Stx2-converting prophages could not be fully resolved using short-read sequence data alone and it was not clear if there were other genomic differences between 95JB1 and 95NR1. In this study we have used Pacific Biosciences (PacBio) single molecule real-time (SMRT) sequencing to characterise the genome and methylome of 95JB1 and 95NR1. We completely resolved the structure of all prophages including two, tandemly inserted, Stx2-converting prophages in 95NR1 that were absent from 95JB1. Furthermore we defined all insertion sequences and found an additional IS1203 element in the chromosome of 95JB1. Our analysis of the methylome of 95NR1 and 95JB1 identified hemi-methylation of a novel motif (5′-CTGCm6AG-3′) in more than 4000 sites in the 95NR1 genome. These sites were entirely unmethylated in the 95JB1 genome, and included at least 177 potential promoter regions that could contribute to regulatory differences between the strains. IS1203 mediated deactivation of a novel type IIG methyltransferase in 95JB1 is the likely cause of the observed differential patterns of methylation between 95NR1 and 95JB1. This study demonstrates the capability of PacBio SMRT sequencing to resolve complex prophage regions and reveal the genetic and epigenetic heterogeneity within a clonal population of bacteria.