In higher eukaryotes, alternative splicing (AS) and alternative polyadenylation (APA) events can produce multiple transcript isoforms in the majority of genes, which significantly increase the protein- coding potential of a genome (Pan et al., 2008; Anvar et al., 2018). Different transcript isoforms might encode proteins with different functions or affect the mRNA stability and translational capacity, in some sense AS and APA events can dramatically increase the complexity and flexibility of the entire transcriptome and proteome (Yang et al., 2016; Feng et al., 2015; Li et al., 2017a; Wang et al., 2017a). Many databases contained AS events and transcripts in animals are available in some public resources such as ASTD and MAASE (Zheng et al., 2005), whereas there is no database containing full-length transcripts and AS events in plants up to now. Next-generation sequencing (NGS) technology has limitation for identifying AS and APA events due to short reads and low accuracy. In recent years, isoform sequencing (Iso-Seq) using Pacbio single molecule real-time sequencing (SMRT) platform can generate full-length sequences and provide accurate information about AS and transcriptional start sites (Li et al., 2017a). In this study, we collected the plant Iso-Seq data sequenced by Pacbio platform from NCBI database up to the end of 2017, and employed unified pipelines to process all the full-length transcripts in different species. Based on these data, we constructed Plant ISOform sequencing database (PISO, http://cbi.hzau.edu.cn/piso/).
Journal: Plant biotechnology journal