Using SMRT Sequencing, Scientists Uncover Unexpected Transcript Diversity in Fungi
Wednesday, August 12, 2015
A new PLoS One publication from scientists at the Joint Genome Institute, University of Minnesota, and other organizations demonstrates that fungal genomes may contain far more transcript diversity than previously thought.
In “Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing,” lead author Sean Gordon, senior author Zhong Wang, and collaborators used long-read isoform sequencing to characterize four fungal species. In addition to widespread alternative splicing, they found evidence of polycistronic transcription units that could be important engineering targets for genetic manipulation of fungi.
The scientists turned to SMRT® Sequencing to escape the limitations of short-read transcriptome sequencing. “The largest challenges of short-read assembly include resolving hundreds of distinct isoforms derived from the same loci, and overlapping transcripts on the same strand for transcripts that span different loci,” they write. “Reduced sensitivity of short-read assembly to identify multiple isoforms from the same locus and long multi-locus transcripts clouds our ability to accurately define transcriptional units.”
In this work, the team came up with a new bioinformatics pipeline called ToFU (for “Transcript isOforms: Full-length and Unassembled”) that only uses PacBio® long reads to produce a de novo transcriptome. They tested the method on four Agaricomycetes species — Plicaturopsis crispa, Phanerochaete chrysosporium, Trametes versicolor, and Gloeophyllum trabeum — chosen for their established use of alternative splicing and their industrial importance in wood decomposition. They generated full-length transcripts that averaged about 1,600 bases; the longest transcript was nearly 5,600 bases. (Note: The ToFU pipeline is being actively maintained and is available as a standalone package on GitHub or as part of the Iso-Seq™ protocol in the SMRT Analysis software.)
The authors note that fungal species were previously thought to have fewer isoforms, with an estimated 7% of genes being alternatively spliced, compared to more than 40% in Arabidopsis and 95% in humans. This conventional wisdom, however, did not match the findings generated by PacBio long reads and ToFU, which reported more than 9,000 transcribed loci in P. crispa, with 56% of those loci producing at least two different isoforms. “In total, 25.2% of all transcribed loci are alternatively spliced and 28.7% loci have alternative poly-adenylation sites,” they report. “This estimation of splicing rate is likely underestimated, as rare isoforms may skip detection, and we only sampled two conditions.”
The authors assessed the accuracy of their de novo transcriptome analysis pipeline by aligning transcripts to an annotated draft sequence of the P. crispa genome, determining that 99.79% of bases matched the reference. They also compared and noted the difficulty in assembling complex isoforms using short-read data, which has been attempted with several different assemblers. “Overall, a single assembler was only able to reconstruct a small percentage of 22,956 ToFU isoforms, and only 2.8% of isoforms by all three methods,” Gordon et al. write, noting that 70% of transcripts found by ToFU were not reconstructed in full by any of the short-read assemblers they tried. These results “highlight the limitations of current state-of-the-art short-read assembly methods for isoform discovery, and suggest that long-read RNA sequencing is essential for accurate isoform resolution,” the scientists add.
An important component of this project was the discovery of polycistronic transcription units, which are rare in eukaryotes. The team validated several polycistronic transcripts using RT-PCR and amplicon sequencing. Their findings suggest that up to 8% of transcribed genes in this fungal family undergo polycistronic transcription.
For more, watch lead author Sean Gordon’s presentation from PAG 2014.