Scalable RNA Isoform Sequencing using Intramolecular Multiplexed cDNAs
While RNA-sequencing has dramatically accelerated our understanding of biology, quantitation and discovery of full-length RNA isoforms resulting from alternative splicing remain poorly resolved. Alternative splicing is a core regulatory process that modulates the structure, expression, and localization of expressed proteins through differential exon and/or UTR splicing during transcript maturation. Beyond being an integral component of cellular development and homeostatic maintenance, RNA splicing is implicated in a wide range of pathologies with hallmark isoforms being linked to cardiovascular, neurological, and immunological diseases. Current limitations in isoform quantitation and discovery arise from the inability of existing sequencing platforms to scalably sequence full-length mRNAs – short-read approaches are unable to span most successive splice sites while long-read approaches are constrained by limited throughput. To enable scalable full-length RNA sequencing, we have developed the method Multiplexed Arrays sequencing (MAS-seq) which maximizes the sequencing potential of isoforms on the PacBio platform. Through the use of deoxy-uracil digestion followed by barcode-directed ligation of cDNAs, MAS-seq generates long multiplexed cDNA arrays with a length distribution that allows for both accurate consensus sequencing and optimal capacity utilization of PacBio sequencers. In combination with upstream artifact depletion measures, MAS-seq boosts the sequencing throughput to approximately 40 million full-length transcripts per SMRT Cell 8M, a >20-fold increase compared to PacBio’s scIso-Seq workflow. We sequenced synthetic RNA isoform standards (Lexogen) and demonstrate a 99.8% isoform identification accuracy using MAS-seq, far exceeding the 56.8% accuracy of Smart-seq3, the best-in-class short-read isoform identification protocol. MAS-seq-enabled single-cell RNA isoform sequencing of tumor infiltrating CD8+ T cells robustly identifies the canonical CD45 (PTPRC) isoforms associated with the range of observed T cell states – a finding additionally validated at the protein level via CITE-seq. Further, we demonstrate the impact of the long-read sequencing throughput gains on single-cell isoform analysis enabled by MAS-seq, providing a 44% increase in single-cell clustering capacity (adjusted Rand index) and a 34-fold gain in identifying differentially spliced genes amongst the CD8+ T cell subtypes. MAS-seq is a streamlined and cost-effective approach that enables scalable bulk and single-cell RNA isoform sequencing.