Menu
June 1, 2021  |  

Resolving the ‘dark matter’ in genomes.

Second-generation sequencing has brought about tremendous insights into the genetic underpinnings of biology. However, there are many functionally important and medically relevant regions of genomes that are currently difficult or impossible to sequence, resulting in incomplete and fragmented views of genomes. Two main causes are (i) limitations to read DNA of extreme sequence content (GC-rich or AT-rich regions, low complexity sequence contexts) and (ii) insufficient read lengths which leave various forms of structural variation unresolved and result in mapping ambiguities.


April 21, 2020  |  

The landscape of SNCA transcripts across synucleinopathies: New insights from long reads sequencing analysis

Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3textquoteright UTR lengths, as well as novel 5textquoteright starts and 3textquoteright ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114kb SNCA region, with the longest phased block excedding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.


April 21, 2020  |  

Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe.

Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and identified 25,246 genes, 86 pre-miRNAs and 1468 lncRNAs. Most genes (12,832, 50.83%) exhibited two or more isoforms, suggesting a high degree of transcriptome complexity in G. australe. A total of 31,448 AS events in five major types were found among the 9944 gene loci. Among these five major types, intron retention was the most frequent, accounting for 68.85% of AS events. 29,718 polyadenylation sites were detected from 14,536 genes, 7900 of which have alternative polyadenylation sites (APA). In addition, based on our AS events annotations, RNA-Seq short reads from germinating seeds showed that differential expression of these events occurred during seed germination. Ten AS events that were randomly selected were further confirmed by RT-PCR amplification in leaf and germinating seeds.The reconstructed gene sequences and their AS in G. australe would provide information for exploring beneficial characteristics in G. australe.


September 22, 2019  |  

neoantigenR: An annotation based pipeline for tumor neoantigen identification from sequencing data

Studies indicate that more than 90% of human genes are alternatively spliced, suggesting the complexity of the transcriptome assembly and analysis. The splicing process is often disrupted, resulting in both functional and non-functional end-products (Sveen et al. 2016) in many cancers. Harnessing the immune system to fight against malignant cancers carrying aberrantly mutated or spliced products is becoming a promising approach to cancer therapy. Advances in immune checkpoint blockade have elicited adaptive immune responses with promising clinical responses to treatments against human malignancies (Tumor Neoantigens in Personalized Cancer Immunotherapy 2017). Emerging data suggest that recognition of patient-specific mutation-associated cancer antigens (i.e. from alternative splicing isoforms) may allow scientists to dissect the immune response in the activity of clinical immunotherapies (Schumacher and Schreiber 2015). The advent of high-throughput sequencing technology has provided a comprehensive view of both splicing aberrations and somatic mutations across a range of human malignancies, allowing for a deeper understanding of the interplay of various disease mechanisms. Meanwhile, studies show that the number of transcript isoforms reported to date may be limited by the short-read sequencing due to the inherit limitation of transcriptome reconstruction algorithms, whereas long-read sequencing is able to significantly improve the detection of alternative splicing variants since there is no need to assemble full-length transcripts from short reads. The analysis of these high-throughput long-read sequencing data may permit a systematic view of tumor specific peptide epitopes (also known as neoantigens) that could serve as targets for immunotherapy (Tumor Neoantigens in Personalized Cancer Immunotherapy 2017). Currently, there is no software pipeline available that can efficiently produce mutation-associated cancer antigens from raw high-throughput sequencing data on patient tumor DNA (The Problem with Neoantigen Prediction 2017). In addressing this issue, we introduce a R package that allows the discoveries of peptide epitope candidates, which are the tumor-specific peptide fragments containing potential functional neoantigens. These peptide epitopes consist of structure variants including insertion, deletions, alternative sequences, and peptides from nonsynonymous mutations. Analysis of these precursor candidates with widely used tools such as netMHC allows for the accurate in-silico prediction of neoantigens. The pipeline named neoantigeR is currently hosted in https://github.com/ICBI/neoantigeR.


September 22, 2019  |  

Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA-seq in strawberry.

Alternative splicing (AS) is a key post-transcriptional regulatory mechanism, yet little information is known about its roles in fruit crops. Here, AS was globally analyzed in the wild strawberry Fragaria vesca genome with RNA-seq data derived from different stages of fruit development. The AS landscape was characterized and compared between the single-molecule, real-time (SMRT) and Illumina RNA-seq platform. While SMRT has a lower sequencing depth, it identifies more genes undergoing AS (57.67% of detected multiexon genes) when it is compared with Illumina (33.48%), illustrating the efficacy of SMRT in AS identification. We investigated different modes of AS in the context of fruit development; the percentage of intron retention (IR) is markedly reduced whereas that of alternative acceptor sites (AA) is significantly increased post-fertilization when compared with pre-fertilization. When all the identified transcripts were combined, a total of 66.43% detected multiexon genes in strawberry undergo AS, some of which lead to a gain or loss of conserved domains in the gene products. The work demonstrates that SMRT sequencing is highly powerful in AS discovery and provides a rich data resource for later functional studies of different isoforms. Further, shifting AS modes may contribute to rapid changes of gene expression during fruit set.© 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.


September 22, 2019  |  

Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing.

Genes in prokaryotic genomes are often arranged into clusters and co-transcribed into polycistronic RNAs. Isolated examples of polycistronic RNAs were also reported in some higher eukaryotes but their presence was generally considered rare. Here we developed a long-read sequencing strategy to identify polycistronic transcripts in several mushroom forming fungal species including Plicaturopsis crispa, Phanerochaete chrysosporium, Trametes versicolor, and Gloeophyllum trabeum. We found genome-wide prevalence of polycistronic transcription in these Agaricomycetes, involving up to 8% of the transcribed genes. Unlike polycistronic mRNAs in prokaryotes, these co-transcribed genes are also independently transcribed. We show that polycistronic transcription may interfere with expression of the downstream tandem gene. Further comparative genomic analysis indicates that polycistronic transcription is conserved among a wide range of mushroom forming fungi. In summary, our study revealed, for the first time, the genome prevalence of polycistronic transcription in a phylogenetic range of higher fungi. Furthermore, we systematically show that our long-read sequencing approach and combined bioinformatics pipeline is a generic powerful tool for precise characterization of complex transcriptomes that enables identification of mRNA isoforms not recovered via short-read assembly.


September 22, 2019  |  

Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing.In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells.Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.


September 22, 2019  |  

Assessing the gene content of the megagenome: sugar pine (Pinus lambertiana).

Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq has been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here contribute to the otherwise scarce comparisons of 2nd and 3rd generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data was also used to address some of the questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. Copyright © 2016 Author et al.


September 22, 2019  |  

A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing.

Maize and sorghum are both important crops with similar overall plant architectures, but they have key differences, especially in regard to their inflorescences. To better understand these two organisms at the molecular level, we compared expression profiles of both protein-coding and noncoding transcripts in 11 matched tissues using single-molecule, long-read, deep RNA sequencing. This comparative analysis revealed large numbers of novel isoforms in both species. Evolutionarily young genes were likely to be generated in reproductive tissues and usually had fewer isoforms than old genes. We also observed similarities and differences in alternative splicing patterns and activities, both among tissues and between species. The maize subgenomes exhibited no bias in isoform generation; however, genes in the B genome were more highly expressed in pollen tissue, whereas genes in the A genome were more highly expressed in endosperm. We also identified a number of splicing events conserved between maize and sorghum. In addition, we generated comprehensive and high-resolution maps of poly(A) sites, revealing similarities and differences in mRNA cleavage between the two species. Overall, our results reveal considerable splicing and expression diversity between sorghum and maize, well beyond what was reported in previous studies, likely reflecting the differences in architecture between these two species.© 2018 Wang et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Targeted combinatorial alternative splicing generates brain region-specific repertoires of neurexins.

Molecular diversity of surface receptors has been hypothesized to provide a mechanism for selective synaptic connectivity. Neurexins are highly diversified receptors that drive the morphological and functional differentiation of synapses. Using a single cDNA sequencing approach, we detected 1,364 unique neurexin-a and 37 neurexin-ß mRNAs produced by alternative splicing of neurexin pre-mRNAs. This molecular diversity results from near-exhaustive combinatorial use of alternative splice insertions in Nrxn1a and Nrxn2a. By contrast, Nrxn3a exhibits several highly stereotyped exon selections that incorporate novel elements for posttranscriptional regulation of a subset of transcripts. Complexity of Nrxn1a repertoires correlates with the cellular complexity of neuronal tissues, and a specific subset of isoforms is enriched in a purified cell type. Our analysis defines the molecular diversity of a critical synaptic receptor and provides evidence that neurexin diversity is linked to cellular diversity in the nervous system. Copyright © 2014 Elsevier Inc. All rights reserved.


September 22, 2019  |  

Novel exons and splice variants in the human antibody heavy chain identified by single cell and single molecule sequencing.

Antibody heavy chains contain a variable and a constant region. The constant region of the antibody heavy chain is encoded by multiple groups of exons which define the isotype and therefore many functional characteristics of the antibody. We performed both single B cell RNAseq and long read single molecule sequencing of antibody heavy chain transcripts and were able to identify novel exons for IGHA1 and IGHA2 as well as novel isoforms for IGHM antibody heavy chain.


September 22, 2019  |  

Alternative isoform analysis of Ttc8 expression in the rat pineal gland using a multi-platform sequencing approach reveals neural regulation.

Alternative isoform regulation (AIR) vastly increases transcriptome diversity and plays an important role in numerous biological processes and pathologies. However, the detection and analysis of isoform-level differential regulation is difficult, particularly in the face of complex and incompletely-annotated transcriptomes. Here we have used Illumina short-read/high-throughput RNA-Seq to identify 55 genes that exhibit neurally-regulated AIR in the pineal gland, and then used two other complementary experimental platforms to further study and characterize the Ttc8 gene, which is involved in Bardet-Biedl syndrome and non-syndromic retinitis pigmentosa. Use of the JunctionSeq analysis tool led to the detection of several novel exons and splice junctions in this gene, including two novel alternative transcription start sites which were found to display disproportionately strong neurally-regulated differential expression in several independent experiments. These high-throughput sequencing results were validated and augmented via targeted qPCR and long-read Pacific Biosciences SMRT sequencing. We confirmed the existence of numerous novel splice junctions and the selective upregulation of the two novel start sites. In addition, we identified more than 20 novel isoforms of the Ttc8 gene that are co-expressed in this tissue. By using information from multiple independent platforms we not only greatly reduce the risk of errors, biases, and artifacts influencing our results, we also are able to characterize the regulation and splicing of the Ttc8 gene more deeply and more precisely than would be possible via any single platform. The hybrid method outlined here represents a powerful strategy in the study of the transcriptome.


September 22, 2019  |  

Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells.

Full-length RNA sequencing (RNA-Seq) has been applied to bulk tissue, cell lines and sorted cells to characterize transcriptomes, but applying this technology to single cells has proven to be difficult, with less than ten single-cell transcriptomes having been analyzed thus far. Although single splicing events have been described for =200 single cells with statistical confidence, full-length mRNA analyses for hundreds of cells have not been reported. Single-cell short-read 3′ sequencing enables the identification of cellular subtypes, but full-length mRNA isoforms for these cell types cannot be profiled. We developed a method that starts with bulk tissue and identifies single-cell types and their full-length RNA isoforms without fluorescence-activated cell sorting. Using single-cell isoform RNA-Seq (ScISOr-Seq), we identified RNA isoforms in neurons, astrocytes, microglia, and cell subtypes such as Purkinje and Granule cells, and cell-type-specific combination patterns of distant splice sites. We used ScISOr-Seq to improve genome annotation in mouse Gencode version 10 by determining the cell-type-specific expression of 18,173 known and 16,872 novel isoforms.


September 22, 2019  |  

Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing.

Neurexins are evolutionarily conserved presynaptic cell-adhesion molecules that are essential for normal synapse formation and synaptic transmission. Indirect evidence has indicated that extensive alternative splicing of neurexin mRNAs may produce hundreds if not thousands of neurexin isoforms, but no direct evidence for such diversity has been available. Here we use unbiased long-read sequencing of full-length neurexin (Nrxn)1a, Nrxn1ß, Nrxn2ß, Nrxn3a, and Nrxn3ß mRNAs to systematically assess how many sites of alternative splicing are used in neurexins with a significant frequency, and whether alternative splicing events at these sites are independent of each other. In sequencing more than 25,000 full-length mRNAs, we identified a novel, abundantly used alternatively spliced exon of Nrxn1a and Nrxn3a (referred to as alternatively spliced sequence 6) that encodes a 9-residue insertion in the flexible hinge region between the fifth LNS (laminin-a, neurexin, sex hormone-binding globulin) domain and the third EGF-like sequence. In addition, we observed several larger-scale events of alternative splicing that deleted multiple domains and were much less frequent than the canonical six sites of alternative splicing in neurexins. All of the six canonical events of alternative splicing appear to be independent of each other, suggesting that neurexins may exhibit an even larger isoform diversity than previously envisioned and comprise thousands of variants. Our data are consistent with the notion that a-neurexins represent extracellular protein-interaction scaffolds in which different LNS and EGF domains mediate distinct interactions that affect diverse functions and are independently regulated by independent events of alternative splicing.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.