Tremendous flexibility is maintained in the human proteome via alternative splicing, and cancer genomes often subvert this flexibility to promote survival. Identification and annotation of cancer-specific mRNA isoforms is critical…
The Iso-Seq method enables the sequencing of transcript isoforms from the 5’ end to their poly-A tails, eliminating the need for transcript reconstruction and inference. This webinar provides a comprehensive…
In this webinar we present Single Molecule, Real-Time (SMRT) Sequencing and the Iso-Seq method, which allow you to generate full-length cDNA sequences — no assembly required — to characterize transcript…
In this ASHG 2020 PacBio Workshop Hagen Tilgner of Cornell University shares how he has used single-cell RNA sequencing using long reads to identify novel isoform expression in brain tissues.
Watch this short tutorial to learn how to get started with the full-length RNA sequencing SMRT Sequencing application using highly accurate long reads (HiFi reads)
Insights into transcriptional characteristics and homoeolog expression bias of embryo and de-embryonated kernels in developing grain through RNA-Seq and Iso-Seq.
Bread wheat (Triticum aestivum L.) is an allohexaploid, and the transcriptional characteristics of the wheat embryo and endosperm during grain development remain unclear. To analyze the transcriptome, we performed isoform sequencing (Iso-Seq) for wheat grain and RNA sequencing (RNA-Seq) for the embryo and de-embryonated kernels. The differential regulation between the embryo and de-embryonated kernels was found to be greater than the difference between the two time points for each tissue. Exactly 2264 and 4790 tissue-specific genes were found at 14 days post-anthesis (DPA), while 5166 and 3784 genes were found at 25 DPA in the embryo and de-embryonated kernels, respectively. Genes expressed in the embryo were more likely to be related to nucleic acid and enzyme regulation. In de-embryonated kernels, genes were rich in substance metabolism and enzyme activity functions. Moreover, 4351, 4641, 4516, and 4453 genes with the A, B, and D homoeoloci were detected for each of the four tissues. Expression characteristics suggested that the D genome may be the largest contributor to the transcriptome in developing grain. Among these, 48, 66, and 38 silenced genes emerged in the A, B, and D genomes, respectively. Gene ontology analysis showed that silenced genes could be inclined to different functions in different genomes. Our study provided specific gene pools of the embryo and de-embryonated kernels and a homoeolog expression bias model on a large scale. This is helpful for providing new insights into the molecular physiology of wheat.
The landscape of SNCA transcripts across synucleinopathies: New insights from long reads sequencing analysis
Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3textquoteright UTR lengths, as well as novel 5textquoteright starts and 3textquoteright ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114kb SNCA region, with the longest phased block excedding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.
TIN2 is an important regulator of telomere length, and mutations in TINF2, the gene encoding TIN2, cause short-telomere syndromes. While the genetics underscore the importance of TIN2, the mechanism through which TIN2 regulates telomere length remains unclear. Here, we tested the effects of human TIN2 on telomerase activity. We identified a new isoform in human cells, TIN2M, that is expressed at levels similar to those of previously studied TIN2 isoforms. All three TIN2 isoforms localized to and maintained telomere integrity in vivo, and localization was not disrupted by telomere syndrome mutations. Using direct telomerase activity assays, we discovered that TIN2 stimulated telomerase processivity in vitro All of the TIN2 isoforms stimulated telomerase to similar extents. Mutations in the TPP1 TEL patch abrogated this stimulation, suggesting that TIN2 functions with TPP1/POT1 to stimulate telomerase processivity. We conclude from our data and previously published work that TIN2/TPP1/POT1 is a functional shelterin subcomplex. Copyright © 2019 Pike et al.
Alternative splicing of pre-mRNAs is a crucial mechanism for maintaining protein diversity in eukaryotes without requiring a considerable increase of genes in the number. Due to rapid advances in high-throughput sequencing technologies and computational algorithms, it is anticipated that alternative splicing events will be more intensively studied to address different kinds of biological questions. The occurrences of alternative splicing mean that all exons could be classified to be either constitutively or alternatively spliced depending on whether they are virtually included into all mature mRNAs. From an evolutionary point of view, therefore, the alternatively spliced exons would have been associated with distinctive biological characteristics in comparison with constitutively spliced exons. In this paper, we first outline the representative types of alternative splicing events and exon classification, and then review sequence and evolutionary features for the alternatively spliced exons. The main purpose is to facilitate understanding of the biological implications of alternative splicing in eukaryotes. This knowledge is also helpful to establish computational approaches for predicting the splicing pattern of exons.
SMRT sequencing analysis reveals the full-length transcripts and alternative splicing patterns in Ananas comosus var. bracteatus.
Ananas comosus var. bracteatus is an herbaceous perennial monocot cultivated as an ornamental plant for its chimeric leaves. Because of its genomic complexity, and because no genomic information is available in the public GenBank database, the complete structure of the mRNA transcript is unclear and there are limited molecular mechanism studies for Ananas comosus var. bracteatus.Three size fractionated full-length cDNA libraries (1-2 kb, 2-3 kb, and 3-6 kb) were constructed and subsequently sequenced in five single-molecule real-time (SMRT) cells (2 cells, 2 cells, and 1 cell, respectively).In total, 19,838 transcripts were identified for alternative splicing (AS) analysis. Among them, 19,185 (96.7%) transcripts were functionally annotated. A total of 9,921 genes were identified by mapping the non-redundant isoforms to the reference genome. A total of 10,649 AS events were identified, the majority of which were intron retention events. The alternatively spliced genes had functions in the basic metabolism processes of the plant such as carbon metabolism, amino acid biosynthesis, and glycolysis. Fourteen genes related to chlorophyll biosynthesis were identified as having AS events. The distribution of the splicing sites and the percentage of conventional and non-canonical AS sites of the genes categorized in pathways related to the albino leaf phenotype (ko00860, ko00195, ko00196, and ko00710) varied greatly. The present results showed that there were 8,316 genes carrying at least one poly (A) site, which generated 21,873 poly (A) sites. These findings indicated that the quality of the gene structure and functional information of the obtained genome was greatly improved, which may facilitate further genetic study of Ananas comosus var. bracteatus.
Tissue specific alpha-2-Macroglobulin (A2M) splice isoform diversity in Hilsa shad, Tenualosa ilisha (Hamilton, 1822).
The present study, for the first time, reported twelve A2M isoforms in Tenualosa ilisha, through SMRT sequencing. Hilsa shad, T. ilisha, an anadromous fish, faces environmental stresses and is thus prone to diseases. Here, expression profiles of different A2M isoforms in four tissues were studied in T. ilisha, for the tissue specific diversity of A2M. Large scale high quality full length transcripts (>0.99% accuracy) were obtained from liver, ovary, testes and gill transcriptomes, through Iso-sequencing on PacBio RSII. A total of 12 isoforms, with complete putatative proteins, were detected in three tissues (7 isoforms in liver, 4 in ovary and 1 in testes). Complete structure of A2M mRNA was predicted from these isoforms, containing 4680 bp sequence, 35 exons and 1508 amino acids. With Homo sapiens A2M as reference, six functional domains (A2M_N,A2M_N2, A2M, Thiol-ester_cl, Complement and Receptor domain), along with a bait region, were predicted in A2M consensus protein. A total of 35 splice sites were identified in T. ilisha A2M consensus transcript, with highest frequency (55.7%) of GT-AG splice sites, as compared to that of Homo sapiens. Liver showed longest isoform (X1) consisting of all domains, while smallest (X10) was found in ovary with one Receptor domain. Present study predicted five putative markers (I-212, I-269, A-472, S-567 and Y-906) for EUS disease resistance in A2M protein, which were present in MG2 domains (A2M_N and A2M_N2), by comparing with that of resistant and susceptible/unknown response species. These markers classified fishes into two groups, resistant and susceptible response. Potential markers, predicted in T. ilisha, placed it to be EUS susceptible category. Putative markers reported in A2M protein may serve as molecular markers in diagnosis of EUS disease resistance/susceptibility in fishes and may have a potential for inclusion in the marker panel for pilot studies. Further, challenging studies are required to confirm the role of particular A2M isoforms and markers identified in immune protection against EUS disease.
Genome-wide Transcript Structure Resolution Reveals Abundant Alternate Isoform Usage from Murine Gammaherpesvirus 68.
The gammaherpesviruses, including Epstein-Barr virus (EBV), Kaposi’s sarcoma-associated herpesvirus (KSHV), and murine gammaherpesvirus 68 (MHV68, MuHV-4, ?HV68), are etiologic agents of a wide range of lymphomas and non-hematological malignancies. These viruses possess large and highly dense dsDNA genomes that feature >80 bidirectionally positioned open reading frames (ORFs). The abundance of overlapping transcripts and extensive splicing throughout these genomes have until now prohibited high throughput-based resolution of transcript structures. Here, we integrate the capabilities of long-read sequencing with the accuracy of short-read platforms to globally resolve MHV68 transcript structures using the transcript resolution through integration of multi-platform data (TRIMD) pipeline. This approach reveals highly complex features, including: (1) pervasive overlapping transcript structures; (2) transcripts containing intra-gene or trans-gene splices that yield chimeric ORFs; (3) antisense and intergenic transcripts containing ORFs; and (4) noncoding transcripts. This work sheds light on the underappreciated complexity of gammaherpesvirus transcription and provides an extensively revised annotation of the MHV68 transcriptome. Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.
TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts.
Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants.Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions.All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean.
Identification of putative genes for polyphenol biosynthesis in olive fruits and leaves using full-length transcriptome sequencing.
Olive (Olea europaea) is a rich source of valuable bioactive polyphenols, which has attracted widespread interest. In this study, we combined targeted metabolome, Pacbio ISOseq transcriptome, and Illumina RNA-seq transcriptome to investigate the association between polyphenols and gene expression in the developing olive fruits and leaves. A total of 12 main polyphenols were measured, and 122 transcripts of 17 gene families, 101 transcripts of 9 gene families, and 106 transcripts of 6 gene families that encode for enzymes involved in flavonoid, oleuropein, and hydroxytyrosol biosynthesis were separately identified. Additionally, 232 alternative splicing events of 18 genes related to polyphenol synthesis were analyzed. This is the first time that the third generations of full-length transcriptome technology were used to study the gene expression pattern of olive fruits and leaves. The results of transcriptome combined with targeted metabolome can help us better understand the polyphenol biosynthesis pathways in the olive.Copyright © 2019 Elsevier Ltd. All rights reserved.
Expression of mutant Ataxin-1 with an abnormally expanded polyglutamine domain is necessary for the onset and progression of spinocerebellar ataxia type 1 (SCA1). Understanding how Ataxin-1 expression is regulated in the human brain could inspire novel molecular therapies for this fatal, dominantly inherited neurodegenerative disease. Previous studies have shown that the ATXN1 3’UTR plays a key role in regulating the Ataxin-1 cellular pool via diverse post-transcriptional mechanisms. Here we show that elements within the ATXN1 5’UTR also participate in the regulation of Ataxin-1 expression. PCR and PacBio sequencing analysis of cDNA obtained from control and SCA1 human brain samples revealed the presence of three major, alternatively spliced ATXN1 5’UTR variants. In cell-based assays, fusion of these variants upstream of an EGFP reporter construct revealed significant and differential impacts on total EGFP protein output, uncovering a type of genetic rheostat-like function of the ATXN1 5’UTR. We identified ribosomal scanning of upstream AUG codons and increased transcript instability as potential mechanisms of regulation. Importantly, transcript-based analyses revealed significant differences in the expression pattern of ATXN1 5’UTR variants between control and SCA1 cerebellum. Together, the data presented here shed light into a previously unknown role for the ATXN1 5’UTR in the regulation of Ataxin-1 and provide new opportunities for the development of SCA1 therapeutics. Copyright © 2019. Published by Elsevier Inc.