Sequel System Data Release: Iso-Seq Results for Hummingbird and Zebra Finch Brain Tissue
Thursday, August 31, 2017
If you’re interested in avian vocal learning or want to explore a PacBio Iso-Seq data set generated with the Sequel System, we have good news. We’ve just released data from Iso-Seq interrogations of brain tissue from two avian models of vocal learning, Anna’s hummingbird (Calypte anna) and zebra finch (Taeniopygia guttata), sequenced in collaboration with the Erich Jarvis and Olivier Fedrigo labs at the Rockefeller University.
If you’re not familiar with the Iso-Seq method, it’s the long-read sequencing answer to short-read RNA-seq studies. By using SMRT Sequencing for a transcriptome project, scientists can generate full-length isoform data, clearly capturing alternative splicing events to see the real diversity of transcripts. Unlike RNA-seq approaches, the Iso-Seq method takes advantage of long-read data to fully span transcript isoforms from the 5’ end to their poly-A tails, eliminating the need for error-prone transcript reconstruction and inference processes. With the Sequel System, Iso-Seq projects are low cost and time efficient. Currently we recommend only 1-2 SMRT Cells per tissue type for genome annotation.
For this data set, we used the Iso-Seq method to characterize the transcriptomes of two birds, with brain total RNA. The two species’ brain samples were barcoded, pooled, and sequenced using 4 SMRT Cells on the Sequel System. An average of ~460,000 reads was generated per SMRT Cell; total sequencing data yields ranged from 6.1 to 7.7 Gb per SMRT Cell. More than 15,000 isoforms were identified in each species, including thousands that had not been previously annotated in each bird and 400 to 500 new genes.
The data set contains both the raw pooled sequences and the processed, demultiplexed sequence files, separated by species and excluding any raw sequences not containing barcodes. Our initial analysis of these data is presented in this poster (Vierra et al.), which is being presented this week at the Genome 10K and Genome Science Conference at the Earlham Institute. It demonstrates how improved loading on the Sequel System simplifies library prep and how both command-line and new SMRT Link tools can be used for analysis. It also illustrates how full-length transcript data can help identify additional exons and UTRs.
Enjoy the data!
If you use the data and our analyses in our publication before we complete our study, please cite:
Michelle N. Vierra, Sarah B. Kingan , Elizabeth Tseng , Tyson Clark, Ting Hon, William J. Rowell, Jacquelyn Mountcastle, Olivier Fedrigo, Erich D. Jarvis, Jonas Korlach. From RNA to Full-Length Transcripts: The PacBio Iso-Seq Method for Transcriptome Analysis and Genome Annotation. Genome10K and Genome Science Conference Abstracts 2017.