April 13, 2017  |  General

New QC pipeline for Iso-Seq data increases confidence in transcript results

A preprint from scientists at the University of Florida, Centro de Investigaciones Principe Felipe, and other institutes describes a new analysis tool to help boost quality of transcriptome studies. “SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification” comes from lead author Manuel Tardaguila, senior author Ana Conesa, and collaborators.

The automated pipeline for Structural and Quality Annotation of Novel Transcript Isoforms (SQANTI) was developed as a quality-assessment tool for transcripts discovered with SMRT Sequencing. SQANTI “calculates up to 35 different descriptors of transcript quality and creates a wide range of summary graphs to aid in the interpretation of the sequencing output,” the authors report.

Development of this new pipeline was spurred by the realization that different transcript analysis tools yielded different results, even for the same data set. “As an example, sequencing the mouse neural transcriptome with PacBio long reads, we obtained ~ 80,000, 12,000 and 16,000 different transcripts when applying Tapis, IDP or the ToFU pipeline, respectively,” the scientists write. “Implementing a comprehensive, quality aware analysis of PacBio reads is fundamental at a time when long read transcriptome sequencing is becoming more popular and important conclusions on transcriptome diversity will be drawn from these data.”

SQANTI consists of tools to classify transcripts by comparison to a reference annotation, analyze data by more than 30 metrics, and generate graphs to report results. The team tested it using neural tissue from mice, performing extensive RT-PCR validation to measure transcript expression. PacBio sequencing of the tissue identified many novel transcripts, but “an important fraction of the novel sequences are presumably bioinformatics or retrotranscription artifacts that can be removed by using SQANTI descriptors,” the scientists report.

They also evaluated results against data from short-read sequencing. “A comparison of Iso-Seq over the classical RNA-seq approaches solely based on short-reads demonstrates that the PacBio transcriptome not only succeeds in capturing the most robustly expressed fraction of transcripts, but also avoids quantification errors caused by unaccounted 3’ end variability in the reference,” Tardaguila et al. write. “SQANTI allows the user to maximize the analytical outcome of long read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.”

