Menu
September 22, 2019

Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing.

Neurexins are evolutionarily conserved presynaptic cell-adhesion molecules that are essential for normal synapse formation and synaptic transmission. Indirect evidence has indicated that extensive alternative splicing of neurexin mRNAs may produce hundreds if not thousands of neurexin isoforms, but no direct evidence for such diversity has been available. Here we use unbiased long-read sequencing of full-length neurexin (Nrxn)1a, Nrxn1ß, Nrxn2ß, Nrxn3a, and Nrxn3ß mRNAs to systematically assess how many sites of alternative splicing are used in neurexins with a significant frequency, and whether alternative splicing events at these sites are independent of each other. In sequencing more than 25,000 full-length mRNAs, we identified a novel, abundantly used alternatively spliced exon of Nrxn1a and Nrxn3a (referred to as alternatively spliced sequence 6) that encodes a 9-residue insertion in the flexible hinge region between the fifth LNS (laminin-a, neurexin, sex hormone-binding globulin) domain and the third EGF-like sequence. In addition, we observed several larger-scale events of alternative splicing that deleted multiple domains and were much less frequent than the canonical six sites of alternative splicing in neurexins. All of the six canonical events of alternative splicing appear to be independent of each other, suggesting that neurexins may exhibit an even larger isoform diversity than previously envisioned and comprise thousands of variants. Our data are consistent with the notion that a-neurexins represent extracellular protein-interaction scaffolds in which different LNS and EGF domains mediate distinct interactions that affect diverse functions and are independently regulated by independent events of alternative splicing.


September 22, 2019

Shannon: an information-optimal de novo RNA-Seq assembler

De novo assembly of short RNA-Seq reads into transcripts is challenging due to sequence similarities in transcriptomes arising from gene duplications and alternative splicing of transcripts. We present Shannon, an RNA-Seq assembler with an optimality guarantee derived from principles of information theory: Shannon reconstructs nearly all information-theoretically reconstructable transcripts. Shannon is based on a theory we develop for de novo RNA-Seq assembly that reveals differing abundances among transcripts to be the key, rather than the barrier, to effective assembly. The assembly problem is formulated as a sparsest-flow problem on a transcript graph, and the heart of Shannon is a novel iterative flow-decomposition algorithm. This algorithm provably solves the information-theoretically reconstructable instances in linear-time even though the general sparsest-flow problem is NP-hard. Shannon also incorporates several additional new algorithmic advances: a new error-correction algorithm based on successive cancelation, a multi-bridging algorithm that carefully utilizes read information in the k-mer de Bruijn graph, and an approximate graph partitioning algorithm to split the transcriptome de Bruijn graph into smaller components. In tests on large RNA-Seq datasets, Shannon obtains significant increases in sensitivity along with improvements in specificity in comparison to state-of-the-art assemblers.


September 22, 2019

Transcriptome-wide survey of pseudorabies virus using next- and third-generation sequencing platforms.

Pseudorabies virus (PRV) is an alphaherpesvirus of swine. PRV has a large double-stranded DNA genome and, as the latest investigations have revealed, a very complex transcriptome. Here, we present a large RNA-Seq dataset, derived from both short- and long-read sequencing. The dataset contains 1.3 million 100?bp paired-end reads that were obtained from the Illumina random-primed libraries, as well as 10 million 50?bp single-end reads generated by the Illumina polyA-seq. The Pacific Biosciences RSII non-amplified method yielded 57,021 reads of inserts (ROIs) aligned to the viral genome, the amplified method resulted in 158,396 PRV-specific ROIs, while we obtained 12,555 ROIs using the Sequel platform. The Oxford Nanopore’s MinION device generated 44,006 reads using their regular cDNA-sequencing method, whereas 29,832 and 120,394 reads were produced by using the direct RNA-sequencing and the Cap-selection protocols, respectively. The raw reads were aligned to the PRV reference genome (KJ717942.1). Our provided dataset can be used to compare different sequencing approaches, library preparation methods, as well as for validation and testing bioinformatic pipelines.


September 22, 2019

Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform.

Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.


September 22, 2019

Characterization of the dynamic transcriptome of a herpesvirus with long-read Single Molecule Real-Time Sequencing.

Herpesvirus gene expression is co-ordinately regulated and sequentially ordered during productive infection. The viral genes can be classified into three distinct kinetic groups: immediate-early, early, and late classes. In this study, a massively parallel sequencing technique that is based on PacBio Single Molecule Real-time sequencing platform, was used for quantifying the poly(A) fraction of the lytic transcriptome of pseudorabies virus (PRV) throughout a 12-hour interval of productive infection on PK-15 cells. Other approaches, including microarray, real-time RT-PCR and Illumina sequencing are capable of detecting only the aggregate transcriptional activity of particular genomic regions, but not individual herpesvirus transcripts. However, SMRT sequencing allows for a distinction between transcript isoforms, including length- and splice variants, as well as between overlapping polycistronic RNA molecules. The non-amplified Isoform Sequencing (Iso-Seq) method was used to analyse the kinetic properties of the lytic PRV transcripts and to then classify them accordingly. Additionally, the present study demonstrates the general utility of long-read sequencing for the time-course analysis of global gene expression in practically any organism.


September 22, 2019

Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


September 22, 2019

Long-read sequencing of human cytomegalovirus transcriptome reveals RNA isoforms carrying distinct coding potentials.

The human cytomegalovirus (HCMV) is a ubiquitous, human pathogenic herpesvirus. The complete viral genome is transcriptionally active during infection; however, a large part of its transcriptome has yet to be annotated. In this work, we applied the amplified isoform sequencing technique from Pacific Biosciences to characterize the lytic transcriptome of HCMV strain Towne varS. We developed a pipeline for transcript annotation using long-read sequencing data. We identified 248 transcriptional start sites, 116 transcriptional termination sites and 80 splicing events. Using this information, we have annotated 291 previously undescribed or only partially annotated transcript isoforms, including eight novel antisense transcripts and their isoforms, as well as a novel transcript (RS2) in the short repeat region, partially antisense to RS1. Similarly to other organisms, we discovered a high transcriptional diversity in HCMV, with many transcripts only slightly differing from one another. Comparing our transcriptome profiling results to an earlier ribosome footprint analysis, we have concluded that the majority of the transcripts contain multiple translationally active ORFs, and also that most isoforms contain unique combinations of ORFs. Based on these results, we propose that one important function of this transcriptional diversity may be to provide a regulatory mechanism at the level of translation.


September 22, 2019

Multiplatform next-generation sequencing identifies novel RNA molecules and transcript isoforms of the endogenous retrovirus isolated from cultured cells.

In this study, we applied short- and long-read RNA sequencing techniques, as well as PCR analysis to investigate the transcriptome of the porcine endogenous retrovirus (PERV) expressed from cultured porcine kidney cell line PK-15. This analysis has revealed six novel transcripts and eight transcript isoforms, including five length and three splice variants. We were able to establish whether a deletion in a transcript is the result of the splicing of mRNAs or of genomic deletion in one of the PERV clones. Additionally, we re-annotated the formerly identified RNA molecules. Our analysis revealed a higher complexity of PERV transcriptome than it was earlier believed.© FEMS 2018. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


September 22, 2019

Fluorescently-tagged human eIF3 for single-molecule spectroscopy.

Human translation initiation relies on the combined activities of numerous ribosome-associated eukaryotic initiation factors (eIFs). The largest factor, eIF3, is an ~800 kDa multiprotein complex that orchestrates a network of interactions with the small 40S ribosomal subunit, other eIFs, and mRNA, while participating in nearly every step of initiation. How these interactions take place during the time course of translation initiation remains unclear. Here, we describe a method for the expression and affinity purification of a fluorescently-tagged eIF3 from human cells. The tagged eIF3 dodecamer is structurally intact, functions in cell-based assays, and interacts with the HCV IRES mRNA and the 40S-IRES complex in vitro. By tracking the binding of single eIF3 molecules to the HCV IRES RNA with a zero-mode waveguides-based instrument, we show that eIF3 samples both wild-type IRES and an IRES that lacks the eIF3-binding region, and that the high-affinity eIF3-IRES interaction is largely determined by slow dissociation kinetics. The application of single-molecule methods to more complex systems involving eIF3 may unveil dynamics underlying mRNA selection and ribosome loading during human translation initiation.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.


September 22, 2019

2′-O-methylation in mRNA disrupts tRNA decoding during translation elongation.

Chemical modifications of mRNA may regulate many aspects of mRNA processing and protein synthesis. Recently, 2′-O-methylation of nucleotides was identified as a frequent modification in translated regions of human mRNA, showing enrichment in codons for certain amino acids. Here, using single-molecule, bulk kinetics and structural methods, we show that 2′-O-methylation within coding regions of mRNA disrupts key steps in codon reading during cognate tRNA selection. Our results suggest that 2′-O-methylation sterically perturbs interactions of ribosomal-monitoring bases (G530, A1492 and A1493) with cognate codon-anticodon helices, thereby inhibiting downstream GTP hydrolysis by elongation factor Tu (EF-Tu) and A-site tRNA accommodation, leading to excessive rejection of cognate aminoacylated tRNAs in initial selection and proofreading. Our current and prior findings highlight how chemical modifications of mRNA tune the dynamics of protein synthesis at different steps of translation elongation.


September 22, 2019

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.


September 22, 2019

Sequence analysis of IncA/C and IncI1 plasmids isolated from multidrug-resistant Salmonella Newport using Single-Molecule Real-Time Sequencing.

Multidrug-resistant (MDR) plasmids play an important role in disseminating antimicrobial resistance genes. To elucidate the antimicrobial resistance gene compositions in A/C incompatibility complex (IncA/C) plasmids carried by animal-derived MDR Salmonella Newport, and to investigate the spread mechanism of IncA/C plasmids, this study characterizes the complete nucleotide sequences of IncA/C plasmids by comparative analysis. Complete nucleotide sequencing of plasmids and chromosomes of six MDR Salmonella Newport strains was performed using PacBio RSII. Open reading frames were assigned using prokaryotic genome annotation pipeline (PGAP). To understand genomic diversity and evolutionary relationships among Salmonella Newport IncA/C plasmids, we included three complete IncA/C plasmid sequences with similar backbones from Salmonella Newport and Escherichia coli: pSN254, pAM04528, and peH4H, and additional 200 draft chromosomes. With the exception of canine isolate CVM22462, which contained an additional IncI1 plasmid, each of the six MDR Salmonella Newport strains contained only the IncA/C plasmid. These IncA/C plasmids (including references) ranged in size from 80.1 (pCVM21538) to 176.5?kb (pSN254) and carried various resistance genes. Resistance genes floR, tetA, tetR, strA, strB, sul, and mer were identified in all IncA/C plasmids. Additionally, blaCMY-2 and sugE were present in all IncA/C plasmids, excepting pCVM21538. Plasmid pCVM22462 was capable of being transferred by conjugation. The IncI1 plasmid pCVM22462b in CVM22462 carried blaCMY-2 and sugE. Our data showed that MDR Salmonella Newport strains carrying similar IncA/C plasmids clustered together in the phylogenetic tree using chromosome sequences and the IncA/C plasmids from animal-derived Salmonella Newport contained diverse resistance genes. In the current study, we analyzed genomic diversities and phylogenetic relationships among MDR Salmonella Newport using complete plasmids and chromosome sequences and provided possible spread mechanism of IncA/C plasmids in Salmonella Newport Lineage II.


September 22, 2019

genomeview – an extensible python-based genomics visualization engine

Visual inspection and analysis are integral to quality control, hypothesis generation, methods development and validation of genomic data. The richness and complexity of genomic data necessitates customized visualizations highlighting specific features of interest while hiding the often vast tide of irrelevant attributes. However, the majority of genome-visualization occurs either in general-purpose tools such as IGV or the UCSC Genome Browser — which offer many options to adjust visualization parameters, but very little in the way of extensibility — or narrowly-focused tools aiming to solve a single visualization problem. Here, we present genomeview, a python-based visualization engine which is easy to extend and simple to integrate into existing analysis pipelines.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.