Menu
September 22, 2019

Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing.In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells.Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.


September 22, 2019

Elevated expression of a minor isoform of ANK3 is a risk factor for bipolar disorder.

Ankyrin-3 (ANK3) is one of the few genes that have been consistently identified as associated with bipolar disorder by multiple genome-wide association studies. However, the exact molecular basis of the association remains unknown. A rare loss-of-function splice-site SNP (rs41283526*G) in a minor isoform of ANK3 (incorporating exon ENSE00001786716) was recently identified as protective of bipolar disorder and schizophrenia. This suggests that an elevated expression of this isoform may be involved in the etiology of the disorders. In this study, we used novel approaches and data sets to test this hypothesis. First, we strengthen the statistical evidence supporting the allelic association by replicating the protective effect of the minor allele of rs41283526 in three additional large independent samples (meta-analysis p-values: 6.8E-05 for bipolar disorder and 8.2E-04 for schizophrenia). Second, we confirm the hypothesis that both bipolar and schizophrenia patients have a significantly higher expression of this isoform than controls (p-values: 3.3E-05 for schizophrenia and 9.8E-04 for bipolar type I). Third, we determine the transcription start site for this minor isoform by Pacific Biosciences sequencing of full-length cDNA and show that it is primarily expressed in the corpus callosum. Finally, we combine genotype and expression data from a large Norwegian sample of psychiatric patients and controls, and show that the risk alleles in ANK3 identified by bipolar disorder GWAS are located near the transcription start site of this isoform and are significantly associated with its elevated expression. Together, these results point to the likely molecular mechanism underlying ANK3´s association with bipolar disorder.


September 22, 2019

Single-molecule DNA sequencing of acute myeloid leukemia and myelodysplastic syndromes with multiple TP53 alterations.

Although the frequency of TP53 mutations in hemato- logic malignancies is low, these mutations have a high clinical relevance and are usually associated with poor prognosis. Somatic TP53 mutations have been detected in up to 73.3% of cases of acute myeloid leukemia (AML) with complex karyotype and 18.9% of AML with other unfavorable cytogenetic risk factors. AML with TP53 mutations, and/or chromosomal aneuploidy, has been defined as a distinct AML subtype. In low-risk myelodysplastic syndromes (MDS), TP53 mutations occur at an early disease stage and predict disease progression. TP53 mutation diagnosis is now part of the revised European LeukemiaNet (ELN) guidelines.


September 22, 2019

Next generation multilocus sequence typing (NGMLST) and the analytical software program MLSTEZ enable efficient, cost-effective, high-throughput, multilocus sequencing typing.

Multilocus sequence typing (MLST) has become the preferred method for genotyping many biological species, and it is especially useful for analyzing haploid eukaryotes. MLST is rigorous, reproducible, and informative, and MLST genotyping has been shown to identify major phylogenetic clades, molecular groups, or subpopulations of a species, as well as individual strains or clones. MLST molecular types often correlate with important phenotypes. Conventional MLST involves the extraction of genomic DNA and the amplification by PCR of several conserved, unlinked gene sequences from a sample of isolates of the taxon under investigation. In some cases, as few as three loci are sufficient to yield definitive results. The amplicons are sequenced, aligned, and compared by phylogenetic methods to distinguish statistically significant differences among individuals and clades. Although MLST is simpler, faster, and less expensive than whole genome sequencing, it is more costly and time-consuming than less reliable genotyping methods (e.g. amplified fragment length polymorphisms). Here, we describe a new MLST method that uses next-generation sequencing, a multiplexing protocol, and appropriate analytical software to provide accurate, rapid, and economical MLST genotyping of 96 or more isolates in single assay. We demonstrate this methodology by genotyping isolates of the well-characterized, human pathogenic yeast Cryptococcus neoformans. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.


September 22, 2019

High-resolution comparative analysis of great ape genomes.

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


September 22, 2019

Extensive allele-specific translational regulation in hybrid mice.

Translational regulation is mediated through the interaction between diffusible trans-factors and cis-elements residing within mRNA transcripts. In contrast to extensively studied transcriptional regulation, cis-regulation on translation remains underexplored. Using deep sequencing-based transcriptome and polysome profiling, we globally profiled allele-specific translational efficiency for the first time in an F1 hybrid mouse. Out of 7,156 genes with reliable quantification of both alleles, we found 1,008 (14.1%) exhibiting significant allelic divergence in translational efficiency. Systematic analysis of sequence features of the genes with biased allelic translation revealed that local RNA secondary structure surrounding the start codon and proximal out-of-frame upstream AUGs could affect translational efficiency. Finally, we observed that the cis-effect was quantitatively comparable between transcriptional and translational regulation. Such effects in the two regulatory processes were more frequently compensatory, suggesting that the regulation at the two levels could be coordinated in maintaining robustness of protein expression. © 2015 The Authors. Published under the terms of the CC BY 4.0 license.


September 22, 2019

Genome-wide characterization of human L1 antisense promoter-driven transcripts.

Long INterspersed Element-1 (LINE-1 or L1) is the only autonomously active, transposable element in the human genome. L1 sequences comprise approximately 17 % of the human genome, but only the evolutionarily recent, human-specific subfamily is retrotransposition competent. The L1 promoter has a bidirectional orientation containing a sense promoter that drives the transcription of two proteins required for retrotransposition and an antisense promoter. The L1 antisense promoter can drive transcription of chimeric transcripts: 5′ L1 antisense sequences spliced to the exons of neighboring genes.The impact of L1 antisense promoter activity on cellular transcriptomes is poorly understood. To investigate this, we analyzed GenBank ESTs for messenger RNAs that initiate in the L1 antisense promoter. We identified 988 putative L1 antisense chimeric transcripts, 911 of which have not been previously reported. These appear to be alternative genic transcripts, sense-oriented with respect to gene and initiating near, but typically downstream of, the gene transcriptional start site. In multiple cell lines, L1 antisense promoters display enrichment for YY1 transcription factor and histone modifications associated with active promoters. Global run-on sequencing data support the activity of the L1 antisense promoter. We independently detected 124 L1 antisense chimeric transcripts using long read Pacific Biosciences RNA-seq data. Furthermore, we validated four chimeric transcripts by quantitative RT-PCR and Sanger sequencing and demonstrated that they are readily detectable in many normal human tissues.We present a comprehensive characterization of human L1 antisense promoter-driven transcripts and provide substantial evidence that they are transcribed in a variety of human cell-types. Our findings reveal a new wide-reaching aspect of L1 biology by identifying antisense transcripts affecting as many as 4 % of all human genes.


September 22, 2019

A single-molecule long-read survey of the human transcriptome.

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify ~14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.


September 22, 2019

Evaluation of tools for long read RNA-seq splice-aware alignment.

High-throughput sequencing has transformed the study of gene expression levels through RNA-seq, a technique that is now routinely used by various fields, such as genetic research or diagnostics. The advent of third generation sequencing technologies providing significantly longer reads opens up new possibilities. However, the high error rates common to these technologies set new bioinformatics challenges for the gapped alignment of reads to their genomic origin. In this study, we have explored how currently available RNA-seq splice-aware alignment tools cope with increased read lengths and error rates. All tested tools were initially developed for short NGS reads, but some have claimed support for long Pacific Biosciences (PacBio) or even Oxford Nanopore Technologies (ONT) MinION reads.The tools were tested on synthetic and real datasets from two technologies (PacBio and ONT MinION). Alignment quality and resource usage were compared across different aligners. The effect of error correction of long reads was explored, both using self-correction and correction with an external short reads dataset. A tool was developed for evaluating RNA-seq alignment results. This tool can be used to compare the alignment of simulated reads to their genomic origin, or to compare the alignment of real reads to a set of annotated transcripts. Our tests show that while some RNA-seq aligners were unable to cope with long error-prone reads, others produced overall good results. We further show that alignment accuracy can be improved using error-corrected reads.https://github.com/kkrizanovic/RNAseqEval, https://figshare.com/projects/RNAseq_benchmark/24391.mile.sikic@fer.hr.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


September 22, 2019

Next generation sequencing technology: Advances and applications.

Impressive progress has been made in the field of Next Generation Sequencing (NGS). Through advancements in the fields of molecular biology and technical engineering, parallelization of the sequencing reaction has profoundly increased the total number of produced sequence reads per run. Current sequencing platforms allow for a previously unprecedented view into complex mixtures of RNA and DNA samples. NGS is currently evolving into a molecular microscope finding its way into virtually every fields of biomedical research. In this chapter we review the technical background of the different commercially available NGS platforms with respect to template generation and the sequencing reaction and take a small step towards what the upcoming NGS technologies will bring. We close with an overview of different implementations of NGS into biomedical research. This article is part of a Special Issue entitled: From Genome to Function. Copyright © 2014 Elsevier B.V. All rights reserved.


September 22, 2019

Transcriptional diversity during lineage commitment of human blood progenitors.

Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type-specific expression changes: 6711 genes and 10,724 transcripts, enriched in non-protein-coding elements at early stages of differentiation. In addition, we found 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrated experimentally cell-specific isoform usage, identifying nuclear factor I/B (NFIB) as a regulator of megakaryocyte maturation-the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine. Copyright © 2014, American Association for the Advancement of Science.


September 22, 2019

Characterization of the human ESC transcriptome by hybrid sequencing.

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


September 22, 2019

Bypassing the Restriction System To Improve Transformation of Staphylococcus epidermidis.

Staphylococcus epidermidis is the leading cause of infections on indwelling medical devices worldwide. Intrinsic antibiotic resistance and vigorous biofilm production have rendered these infections difficult to treat and, in some cases, require the removal of the offending medical prosthesis. With the exception of two widely passaged isolates, RP62A and 1457, the pathogenesis of infections caused by clinical S. epidermidis strains is poorly understood due to the strong genetic barrier that precludes the efficient transformation of foreign DNA into clinical isolates. The difficulty in transforming clinical S. epidermidis isolates is primarily due to the type I and IV restriction-modification systems, which act as genetic barriers. Here, we show that efficient plasmid transformation of clinical S. epidermidis isolates from clonal complexes 2, 10, and 89 can be realized by employing a plasmid artificial modification (PAM) in Escherichia coli DC10B containing a ?dcm mutation. This transformative technique should facilitate our ability to genetically modify clinical isolates of S. epidermidis and hence improve our understanding of their pathogenesis in human infections.IMPORTANCEStaphylococcus epidermidis is a source of considerable morbidity worldwide. The underlying mechanisms contributing to the commensal and pathogenic lifestyles of S. epidermidis are poorly understood. Genetic manipulations of clinically relevant strains of S. epidermidis are largely prohibited due to the presence of a strong restriction barrier. With the introductions of the tools presented here, genetic manipulation of clinically relevant S. epidermidis isolates has now become possible, thus improving our understanding of S. epidermidis as a pathogen. Copyright © 2017 American Society for Microbiology.


September 22, 2019

Identification of differentially expressed splice variants by the proteogenomic pipeline Splicify.

Proteogenomics, i.e. comprehensive integration of genomics and proteomics data, is a powerful approach identifying novel protein biomarkers. This is especially the case for proteins that differ structurally between disease and control conditions. As tumor development is associated with aberrant splicing, we focus on this rich source of cancer specific biomarkers. To this end, we developed a proteogenomic pipeline, Splicify, which is able to detect differentially expressed protein isoforms. Splicify is based on integrating RNA massive parallel sequencing data and tandem mass spectrometry proteomics data to identify protein isoforms resulting from differential splicing between two conditions. Proof of concept was obtained by applying Splicify to RNA sequencing and mass spectrometry data obtained from colorectal cancer cell line SW480, before and after siRNA-mediated down-modulation of the splicing factors SF3B1 and SRSF1. These analyses revealed 2172 and 149 differentially expressed isoforms, respectively, with peptide confirmation upon knock-down of SF3B1 and SRSF1 compared to their controls. Splice variants identified included RAC1, OSBPL3, MKI67 and SYK. One additional sample was analyzed by PacBio Iso-Seq full-length transcript sequencing after SF3B1 down-modulation. This analysis verified the alternative splicing identified by Splicify and in addition identified novel splicing events that were not represented in the human reference genome annotation. Therefore, Splicify offers a validated proteogenomic data analysis pipeline for identification of disease specific protein biomarkers resulting from mRNA alternative splicing. Splicify is publicly available on GitHub (https://github.com/NKI-TGO/SPLICIFY) and suitable to address basic research questions using pre-clinical model systems as well as translational research questions using patient-derived samples, e.g. allowing to identify clinically relevant biomarkers. Copyright © 2017, The American Society for Biochemistry and Molecular Biology.


September 22, 2019

No assembly required: Full-length MHC class I allele discovery by PacBio circular consensus sequencing.

Single-molecule real-time (SMRT) sequencing technology with the Pacific Biosciences (PacBio) RS II platform offers the potential to obtain full-length coding regions (~1100-bp) from MHC class I cDNAs. Despite the relatively high error rate associated with SMRT technology, high quality sequences can be obtained by circular consensus sequencing (CCS) due to the random nature of the error profile. In the present study we first validated the ability of SMRT-CCS to accurately identify class I transcripts in Mauritian-origin cynomolgus macaques (Macaca fascicularis) that have been characterized previously by cloning and Sanger-based sequencing as well as pyrosequencing approaches. We then applied this SMRT-CCS method to characterize 60 novel full-length class I transcript sequences expressed by a cohort of cynomolgus macaques from China. The SMRT-CCS method described here provides a straightforward protocol for characterization of unfragmented single-molecule cDNA transcripts that will potentially revolutionize MHC class I allele discovery in nonhuman primates and other species. Published by Elsevier Inc.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.