Menu
September 22, 2019

Universal alternative splicing of noncoding exons.

The human transcriptome is so large, diverse, and dynamic that, even after a decade of investigation by RNA sequencing (RNA-seq), we have yet to resolve its true dimensions. RNA-seq suffers from an expression-dependent bias that impedes characterization of low-abundance transcripts. We performed targeted single-molecule and short-read RNA-seq to survey the transcriptional landscape of a single human chromosome (Hsa21) at unprecedented resolution. Our analysis reaches the lower limits of the transcriptome, identifying a fundamental distinction between protein-coding and noncoding gene content: almost every noncoding exon undergoes alternative splicing, producing a seemingly limitless variety of isoforms. Analysis of syntenic regions of the mouse genome shows that few noncoding exons are shared between human and mouse, yet human splicing profiles are recapitulated on Hsa21 in mouse cells, indicative of regulation by a deeply conserved splicing code. We propose that noncoding exons are functionally modular, with alternative splicing generating an enormous repertoire of potential regulatory RNAs and a rich transcriptional reservoir for gene evolution. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.


September 22, 2019

First insights into the nature and evolution of antisense transcription in nematodes.

The development of multicellular organisms is coordinated by various gene regulatory mechanisms that ensure correct spatio-temporal patterns of gene expression. Recently, the role of antisense transcription in gene regulation has moved into focus of research. To characterize genome-wide patterns of antisense transcription and to study their evolutionary conservation, we sequenced a strand-specific RNA-seq library of the nematode Pristionchus pacificus.We identified 1112 antisense configurations of which the largest group represents 465 antisense transcripts (ASTs) that are fully embedded in introns of their host genes. We find that most ASTs show homology to protein-coding genes and are overrepresented in proteomic data. Together with the finding, that expression levels of ASTs and host genes are uncorrelated, this indicates that most ASTs in P. pacificus do not represent non-coding RNAs and do not exhibit regulatory functions on their host genes. We studied the evolution of antisense gene pairs across 20 nematode genomes, showing that the majority of pairs is lineage-specific and even the highly conserved vps-4, ddx-27, and sel-2 loci show abundant structural changes including duplications, deletions, intron gains and loss of antisense transcription. In contrast, host genes in general, are remarkably conserved and encode exceptionally long introns leading to unusually large blocks of conserved synteny.Our study has shown that in P. pacificus antisense transcription as such does not define non-coding RNAs but is rather a feature of highly conserved genes with long introns. We hypothesize that the presence of regulatory elements imposes evolutionary constraint on the intron length, but simultaneously, their large size makes them a likely target for translocation of genomic elements including protein-coding genes that eventually end up as ASTs.


September 22, 2019

Novel exons and splice variants in the human antibody heavy chain identified by single cell and single molecule sequencing.

Antibody heavy chains contain a variable and a constant region. The constant region of the antibody heavy chain is encoded by multiple groups of exons which define the isotype and therefore many functional characteristics of the antibody. We performed both single B cell RNAseq and long read single molecule sequencing of antibody heavy chain transcripts and were able to identify novel exons for IGHA1 and IGHA2 as well as novel isoforms for IGHM antibody heavy chain.


September 22, 2019

Long-read sequencing of chicken transcripts and identification of new transcript isoforms.

The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.


September 22, 2019

A genomic case study of mixed fibrolamellar hepatocellular carcinoma.

Mixed fibrolamellar hepatocellular carcinoma (mFL-HCC) is a rare liver tumor defined by the presence of both pure FL-HCC and conventional HCC components, represents up to 25% of cases of FL-HCC, and has been associated with worse prognosis. Recent genomic characterization of pure FL-HCC identified a highly recurrent transcript fusion (DNAJB1:PRKACA) not found in conventional HCC.We performed exome and transcriptome sequencing of a case of mFL-HCC. A novel BAC-capture approach was developed to identify a 400 kb deletion as the underlying genomic mechanism for a DNAJB1:PRKACA fusion in this case. A sensitive Nanostring Elements assay was used to screen for this transcript fusion in a second case of mFL-HCC, 112 additional HCC samples and 44 adjacent non-tumor liver samples.We report the first comprehensive genomic analysis of a case of mFL-HCC. No common HCC-associated mutations were identified. The very low mutation rate of this case, large number of mostly single-copy, long-range copy number variants, and high expression of ERBB2 were more consistent with previous reports of pure FL-HCC than conventional HCC. In particular, the DNAJB1:PRKACA fusion transcript specifically associated with pure FL-HCC was detected at very high expression levels. Subsequent analysis revealed the presence of this fusion in all primary and metastatic samples, including those with mixed or conventional HCC pathology. A second case of mFL-HCC confirmed our finding that the fusion was detectable in conventional components. An expanded screen identified a third case of fusion-positive HCC, which upon review, also had both conventional and fibrolamellar features. This screen confirmed the absence of the fusion in all conventional HCC and adjacent non-tumor liver samples.These results indicate that mFL-HCC is similar to pure FL-HCC at the genomic level and the DNAJB1:PRKACA fusion can be used as a diagnostic tool for both pure and mFL-HCC.© The Author 2016. Published by Oxford University Press on behalf of the European Society for Medical Oncology.


September 22, 2019

High-resolution characterization of the human microbiome.

The human microbiome plays an important and increasingly recognized role in human health. Studies of the microbiome typically use targeted sequencing of the 16S rRNA gene, whole metagenome shotgun sequencing, or other meta-omic technologies to characterize the microbiome’s composition, activity, and dynamics. Processing, analyzing, and interpreting these data involve numerous computational tools that aim to filter, cluster, annotate, and quantify the obtained data and ultimately provide an accurate and interpretable profile of the microbiome’s taxonomy, functional capacity, and behavior. These tools, however, are often limited in resolution and accuracy and may fail to capture many biologically and clinically relevant microbiome features, such as strain-level variation or nuanced functional response to perturbation. Over the past few years, extensive efforts have been invested toward addressing these challenges and developing novel computational methods for accurate and high-resolution characterization of microbiome data. These methods aim to quantify strain-level composition and variation, detect and characterize rare microbiome species, link specific genes to individual taxa, and more accurately characterize the functional capacity and dynamics of the microbiome. These methods and the ability to produce detailed and precise microbiome information are clearly essential for informing microbiome-based personalized therapies. In this review, we survey these methods, highlighting the challenges each method sets out to address and briefly describing methodological approaches. Copyright © 2016 Elsevier Inc. All rights reserved.


September 22, 2019

Alternative isoform analysis of Ttc8 expression in the rat pineal gland using a multi-platform sequencing approach reveals neural regulation.

Alternative isoform regulation (AIR) vastly increases transcriptome diversity and plays an important role in numerous biological processes and pathologies. However, the detection and analysis of isoform-level differential regulation is difficult, particularly in the face of complex and incompletely-annotated transcriptomes. Here we have used Illumina short-read/high-throughput RNA-Seq to identify 55 genes that exhibit neurally-regulated AIR in the pineal gland, and then used two other complementary experimental platforms to further study and characterize the Ttc8 gene, which is involved in Bardet-Biedl syndrome and non-syndromic retinitis pigmentosa. Use of the JunctionSeq analysis tool led to the detection of several novel exons and splice junctions in this gene, including two novel alternative transcription start sites which were found to display disproportionately strong neurally-regulated differential expression in several independent experiments. These high-throughput sequencing results were validated and augmented via targeted qPCR and long-read Pacific Biosciences SMRT sequencing. We confirmed the existence of numerous novel splice junctions and the selective upregulation of the two novel start sites. In addition, we identified more than 20 novel isoforms of the Ttc8 gene that are co-expressed in this tissue. By using information from multiple independent platforms we not only greatly reduce the risk of errors, biases, and artifacts influencing our results, we also are able to characterize the regulation and splicing of the Ttc8 gene more deeply and more precisely than would be possible via any single platform. The hybrid method outlined here represents a powerful strategy in the study of the transcriptome.


September 22, 2019

Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells.

Full-length RNA sequencing (RNA-Seq) has been applied to bulk tissue, cell lines and sorted cells to characterize transcriptomes, but applying this technology to single cells has proven to be difficult, with less than ten single-cell transcriptomes having been analyzed thus far. Although single splicing events have been described for =200 single cells with statistical confidence, full-length mRNA analyses for hundreds of cells have not been reported. Single-cell short-read 3′ sequencing enables the identification of cellular subtypes, but full-length mRNA isoforms for these cell types cannot be profiled. We developed a method that starts with bulk tissue and identifies single-cell types and their full-length RNA isoforms without fluorescence-activated cell sorting. Using single-cell isoform RNA-Seq (ScISOr-Seq), we identified RNA isoforms in neurons, astrocytes, microglia, and cell subtypes such as Purkinje and Granule cells, and cell-type-specific combination patterns of distant splice sites. We used ScISOr-Seq to improve genome annotation in mouse Gencode version 10 by determining the cell-type-specific expression of 18,173 known and 16,872 novel isoforms.


September 22, 2019

A quantitative SMRT cell sequencing method for ribosomal amplicons.

Advances in sequencing technologies continue to provide unprecedented opportunities to characterize microbial communities. For example, the Pacific Biosciences Single Molecule Real-Time (SMRT) platform has emerged as a unique approach harnessing DNA polymerase activity to sequence template molecules, enabling long reads at low costs. With the aim to simultaneously classify and enumerate in situ microbial populations, we developed a quantitative SMRT (qSMRT) approach that involves the addition of exogenous standards to quantify ribosomal amplicons derived from environmental samples. The V7-9 regions of 18S SSU rDNA were targeted and quantified from protistan community samples collected in the Ross Sea during the Austral summer of 2011. We used three standards of different length and optimized conditions to obtain accurate quantitative retrieval across the range of expected amplicon sizes, a necessary criterion for analyzing taxonomically diverse 18S rDNA molecules from natural environments. The ability to concurrently identify and quantify microorganisms in their natural environment makes qSMRT a powerful, rapid and cost-effective approach for defining ecosystem diversity and function. Copyright © 2017 Elsevier B.V. All rights reserved.


September 22, 2019

Shannon: an information-optimal de novo RNA-Seq assembler

De novo assembly of short RNA-Seq reads into transcripts is challenging due to sequence similarities in transcriptomes arising from gene duplications and alternative splicing of transcripts. We present Shannon, an RNA-Seq assembler with an optimality guarantee derived from principles of information theory: Shannon reconstructs nearly all information-theoretically reconstructable transcripts. Shannon is based on a theory we develop for de novo RNA-Seq assembly that reveals differing abundances among transcripts to be the key, rather than the barrier, to effective assembly. The assembly problem is formulated as a sparsest-flow problem on a transcript graph, and the heart of Shannon is a novel iterative flow-decomposition algorithm. This algorithm provably solves the information-theoretically reconstructable instances in linear-time even though the general sparsest-flow problem is NP-hard. Shannon also incorporates several additional new algorithmic advances: a new error-correction algorithm based on successive cancelation, a multi-bridging algorithm that carefully utilizes read information in the k-mer de Bruijn graph, and an approximate graph partitioning algorithm to split the transcriptome de Bruijn graph into smaller components. In tests on large RNA-Seq datasets, Shannon obtains significant increases in sensitivity along with improvements in specificity in comparison to state-of-the-art assemblers.


September 22, 2019

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.


September 22, 2019

A comprehensive approach to expression of L1 loci.

L1 elements represent the only currently active, autonomous retrotransposon in the human genome, and they make major contributions to human genetic instability. The vast majority of the 500 000 L1 elements in the genome are defective, and only a relatively few can contribute to the retrotransposition process. However, there is currently no comprehensive approach to identify the specific loci that are actively transcribed separate from the excess of L1-related sequences that are co-transcribed within genes. We have developed RNA-Seq procedures, as well as a 1200 bp 5? RACE product coupled with PACBio sequencing that can identify the specific L1 loci that contribute most of the L1-related RNA reads. At least 99% of L1-related sequences found in RNA do not arise from the L1 promoter, instead representing pieces of L1 incorporated in other cellular RNAs. In any given cell type a relatively few active L1 loci contribute to the ‘authentic’ L1 transcripts that arise from the L1 promoter, with significantly different loci seen expressed in different tissues.© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.


September 22, 2019

LSCplus: a fast solution for improving long read accuracy by short read alignment.

The single molecule, real time (SMRT) sequencing technology of Pacific Biosciences enables the acquisition of transcripts from end to end due to its ability to produce extraordinarily long reads (>10 kb). This new method of transcriptome sequencing has been applied to several projects on humans and model organisms. However, the raw data from SMRT sequencing are of relatively low quality, with a random error rate of approximately 15 %, for which error correction using next-generation sequencing (NGS) short reads is typically necessary. Few tools have been designed that apply a hybrid sequencing approach that combines NGS and SMRT data, and the most popular existing tool for error correction, LSC, has computing resource requirements that are too intensive for most laboratory and research groups. These shortcomings severely limit the application of SMRT long reads for transcriptome analysis.Here, we report an improved tool (LSCplus) for error correction with the LSC program as a reference. LSCplus overcomes the disadvantage of LSC’s time consumption and improves quality. Only 1/3-1/4 of the time and 1/20-1/25 of the error correction time is required using LSCplus compared with that required for using LSC.LSCplus is freely available at http://www.herbbol.org:8001/lscplus/ . Sample calculations are provided illustrating the precision and efficiency of this method regarding error correction and isoform detection.


September 22, 2019

Direct chromosome-length haplotyping by single-cell sequencing.

Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid genomes along the entire length of all chromosomes. We demonstrate this by building a complete haplotype for a HapMap individual (NA12878) at high accuracy (concordance 99.3%), without using generational information or statistical inference. By use of this approach, we mapped all meiotic recombination events in a family trio with high resolution (median range ~14 kb) and phased larger structural variants like deletions, indels, and balanced rearrangements like inversions. Lastly, the single-cell resolution of Strand-seq allowed us to observe loss of heterozygosity regions in a small number of cells, a significant advantage for studies of heterogeneous cell populations, such as cancer cells. We conclude that Strand-seq is a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells.© 2016 Porubský et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Metagenomic approaches to assess bacteriophages in various environmental niches.

Bacteriophages are ubiquitous and numerous parasites of bacteria and play a critical evolutionary role in virtually every ecosystem, yet our understanding of the extent of the diversity and role of phages remains inadequate for many ecological niches, particularly in cases in which the host is unculturable. During the past 15 years, the emergence of the field of viral metagenomics has drastically enhanced our ability to analyse the so-called viral ‘dark matter’ of the biosphere. Here, we review the evolution of viral metagenomic methodologies, as well as providing an overview of some of the most significant applications and findings in this field of research.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.