Iso-Seq Study Reveals More Complexity than Expected in Maize Transcriptome
Friday, June 24, 2016
In a new publication from Cold Spring Harbor Laboratory, scientists produced a dataset for what authors call “the single largest collection of [full-length] cDNAs available in maize” and significantly improved genome annotation. The effort relied on the Iso-Seq method with SMRT Sequencing, which allows scientists to generate ultra-long reads covering full transcripts.
The paper, “Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing,” comes from lead author Bo Wang and senior author Doreen Ware, who is also affiliated with the USDA Agricultural Research Service. It offers the first published results from using the Iso-Seq method on a maize plant and includes a number of advances, such as barcoding to permit cost-effective pooling of tissue samples.
The scientists embarked on the project to see what advantages long-read sequencing offers for transcriptome analysis in a complex plant. “Although data from short-read sequencing have accumulated over recent years, they do not provide full-length (FL) sequence for each RNA, limiting their utility for defining alternatively spliced forms,” Wang et al. write. “In some cases, short-read sequencing generates low-quality transcripts, leading to incorrect annotations.” Long-read sequencing, on the other hand, makes it possible to capture high-quality, full-length transcripts.
The team used PacBio sequencing for six tissue types from the maize line B73, size-selecting libraries with the SageELF system to increase average read length. Results were impressive, with more than 110,000 transcript isoform sequences associated with nearly 27,000 genes, representing 70% of previously annotated maize genes as well as novel isoforms and even some novel genes. Long noncoding RNAs were another highlight: they found nearly 900 novel lncRNAs, many of them significantly longer than previously identified lncRNAs. “Our analysis indicates that the new transcriptome data have enormous potential to improve the current maize annotation,” the authors write. “The 111,151 unique transcripts characterized here almost double the number of transcripts documented in the RefGen_v3 annotation.”
They also analyzed alternative splicing, finding more than twice as many isoforms per gene than exist in the previous maize annotation and contributing thousands of novel isoforms to the public resource. To learn more, they studied methylation patterns associated with alternative splicing and discovered that CHG methylation appears to suppress splicing while CG methylation apparently increases the rate of splicing.
In addition to demonstrating that SMRT Sequencing data could correct mis-annotated gene models, the team showed that long reads are even more important than expected for transcriptome studies. The average transcript length in this project — almost 3 kb — is much longer than that from the previous maize annotation. “These findings show that the prevalence of long transcripts, from both coding and non-coding genes, is higher than previously thought,” Wang et al. write. “Just as the availability of short-read technologies over the last decade heralded an era of tremendous gains in small RNA research, it is reasonable to expect that long-read technologies will prompt a new focus on heretofore poorly understood characteristics of exceptionally long RNAs.”
For more, check out this case study [PDF] of Doreen Ware and the maize project.