Cellular cartography has been just as important for biologists to navigate the complex topography of living systems as was the mapping of the earth’s landscapes for explorers hundreds of years ago. While much has been learned about cellular identities and functions through gene expression analysis, one of the main drivers of cellular differentiation – alternative mRNA splicing – has been under-appreciated due to the lack of methods to comprehensively characterize transcriptomes. With the increasing application of PacBio’s Iso-Seq method for full-length transcript sequencing, this is now changing. Below are just a few recent examples of publications that highlight a shift to higher-quality, isoform-resolved cell atlases.
ENCODE project sees major gains with full-length RNA sequencing
The ENCODE project has long been a principal resource for RNA data in the form of short-read RNA-seq. Now, in a preprint entitled The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity from researchers representing twenty-five institutions, the fourth phase of ENCODE (ENCODE4) adds matching Iso-Seq data in a set of 81 unique human and mouse samples, representing “the first large-scale, cross-species survey of transcript structure diversity using full-length cDNA sequencing on long-read platforms.” The results are dramatic, detecting:
- “a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains.”
- “more than one predominant transcript across samples for 73.0% of genes, which is in contrast with prior reports,” and that
- “for a substantial number of genes, transcript structure diversity and major transcript usage for the same gene differs between tissues, samples, and developmental timepoints.”
This incredibly rich resource is just the beginning of a new era in cell atlas research, with the authors noting that it provides “a foundation for further analyses of alternative transcript usage.” Just as one example, in what the researchers noted as “the most surprising to us,” they observed “substantial differences in splicing diversity for orthologous genes between human and mouse,” with well over half of all orthologous mouse & human genes showing “substantial differences in mechanism of diversification in matching tissues.” They note that this calls for caution in the interpretation of the widely used practice of predicting and interpreting human gene functions from mouse models, both “in genomics and the wider biology community,” and “for both basic and preclinical purposes.”
Delivering a more complete and nuanced understanding of cancer
In cancer research, new isoform-resolved cell atlases from single-cell Iso-Seq data provide exciting opportunities for understanding tumor biology, and potential avenues for immuno-oncology research. A recent preprint illustrating this are Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing for personalized oncology by researchers from the ETH in Zurich, Switzerland, finding that “cancer cells expressed at least twice as many unique isoforms than other cell types,” and that “long-read sequencing provides a more complete picture of cancer-specific changes.” They also show that “short-read scRNA-seq data fails to distinguish between gene and fusion expression, potentially leading to wrong biological conclusions,” with an example of correcting a previous finding of elevated gene expression in matched short-read RNA-seq data as a misclassification, and instead representing a novel gene fusion transcript.
And another, similar example in cancer research is a new preprint entitled An isoform-resolution transcriptomic atlas of colorectal cancer from long-read single-cell sequencing by researchers from Singapore and Saudi Arabia. Noting that short-read RNA-seq “only enable gene-level expression quantification but neglect alterations in transcript structures, which arise from alternative end processing or splicing, and are frequently observed in cancer,” they build the first isoform-resolved colorectal cancer cell atlas, finding hundreds of dysregulated transcripts in tumor cells. Going even further, they predict the resulting tumor-specific peptides “and identified a panel of recurring neoepitopes that may aid the development of neoantigen-based cancer vaccines.” This is extremely exciting, as tumor-specific splice variants may represent promising targets for immunotherapies and cancer vaccines, in some cases showing greater specificity and immunogenicity than more traditional SNV-based targets1,2.
Aiding in the fight against malaria
Isoform-resolved cell atlases transform our understanding in many other areas of biological research. As a final example described here, A single cell atlas of sexual development in Plasmodium falciparum by researchers from the Wellcome Sanger Institute and collaborators in Germany, Mali, and France, study the transmission of malaria parasites from human to human, which can only occur when parasites successfully reproduce sexually in the mosquito vector. Using “both short Illumina reads and full-length PacBio Iso–Seq to carry out single-cell RNA sequencing [for over 37,000 cells] along sexual commitment, development, and differentiation”, they report that Iso-Seq revealed stage-specific exon usage, isoforms and novel splicing events in many genes in the lab strains, including “differential exon usage in 6 genes between committed and asexual trophozoites, in 24 genes between males and females, in 39 genes between females and asexuals, and in 17 genes between males and asexuals”, and also in “a Malian individual naturally infected with multiple P. falciparum strains”. Thus, the work “is a key addition to the Malaria Cell Atlas,” with “enormous potential in improving our understanding of malaria parasite biology and transmission in natural infections.” Check out the cool interactive viewer at malariacellatlas.org!
Join the movement toward better biology
Let us be your companion as you navigate the vast oceans of cellular intricacies, with improved resolution of transcript isoforms revealing the true complexity of the transcriptome for the first time. With throughput increases provided by the Revio system, as well as MAS-Seq for both single-cell (through our scMAS-Seq product) and bulk Iso-Seq applications (coming soon, check out a representative dataset here), the transcriptomes for every sample can now be charted as a comprehensive, high-definition, isoform-resolved map.