A new paper from scientists at Stanford University and Yale University describes the use of Single Molecule, Real-Time (SMRT®) Sequencing to generate transcriptomes for three individuals. The work is believed to be the first personal transcriptome analysis using long-read sequencing.
The paper, entitled “Defining a personal, allele-specific, and single-molecule long-read transcriptome,” was published in PNAS by Hagen Tilgner, Fabian Grubert, Donald Sharon, and Michael Snyder. Last year, the same authors published a study using SMRT Sequencing to analyze transcriptomes across tissue samples from human organs. In the PNAS publication, they compare metrics from the new data set to those from the previous study.
For the current project, the team sequenced lymphoblastoid transcriptomes from three family members with PacBio® long reads; they also generated short reads from an Illumina® platform for comparison purposes. They focused on long-read sequencing because of the known limitations associated with using short reads for transcriptome analysis: “Recent work has shown that reconstruction and quantification of transcript isoforms from short-read sequencing is insufficiently accurate,” the authors write. They aimed to learn more about how long-read sequence could be used to understand allelic variants.
The scientists found that for genes up to 3 kb in length, “reads representing all splice sites of a transcript are evident” as long as the genes have sufficient expression. Importantly, they successfully “determined single-nucleotide variants (SNVs) in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules.”
Tilgner et al. note that because PacBio reads come from a single RNA molecule, they can be associated with variants in specific alleles. The same gene can be processed into different forms of RNA depending on whether it is read from the maternal or paternal chromosome, and it has been difficult with short-read technologies to differentiate these allele-specific variants. That “is a significant step toward connecting multiple variables along the RNA molecule, such as SNVs, RNA editing and splice sites, transcription start sites (TSSes), and polyA sites,” they write. “This technique allows the assessment of biased allelic expression and isoform expression.”
Long-read sequencing also allowed the scientists to find previously unidentified isoforms, which they used to supplement an existing annotation, effectively building the first known personalized annotation.