Menu
September 22, 2019

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.


September 22, 2019

Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads.

Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs.We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms.PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.


September 22, 2019

Long-term microbiota and virome in a Zürich patient after fecal transplantation against Clostridium difficile infection.

Fecal microbiota transplantation (FMT) is an emerging therapeutic option for Clostridium difficile infections that are refractory to conventional treatment. FMT introduces fecal microbes into the patient’s intestine that prevent the recurrence of C. difficile, leading to rapid expansion of bacteria characteristic of healthy microbiota. However, the long-term effects of FMT remain largely unknown. The C. difficile patient described in this paper revealed protracted microbiota adaptation processes from 6 to 42 months post-FMT. Ultimately, bacterial communities were donor similar, suggesting sustainable stool engraftment. Since little is known about the consequences of transmitted viruses during C. difficile infection, we also interrogated virome changes. Our approach allowed identification of about 10 phage types per sample that represented larger viral communities, and phages were found to be equally abundant in the cured patient and donor. The healthy microbiota appears to be characterized by low phage abundance. Although viruses were likely transferred, the patient established a virome distinct from the donor. Surprisingly, the patient had sequences of algal giant viruses (chloroviruses) that have not previously been reported for the human gut. Chloroviruses have not been associated with intestinal disease, but their presence in the oropharynx may influence cognitive abilities. The findings suggest that the virome is an important indicator of health or disease. A better understanding of the role of viruses in the gut ecosystem may uncover novel microbiota-modulating therapeutic strategies.© 2016 New York Academy of Sciences.


September 22, 2019

A high-quality annotated transcriptome of swine peripheral blood.

High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly.A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5′ termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5′ and 3′ termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection.We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome.


September 22, 2019

Isoform sequencing provides a more comprehensive view of the Panax ginseng transcriptome.

Korean ginseng (Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng, we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana. Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng. In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.


September 22, 2019

A workflow for studying specialized metabolism in nonmodel eukaryotic organisms

Eukaryotes contain a diverse tapestry of specialized metabolites, many of which are of significant pharmaceutical and industrial importance to humans. Nevertheless, exploration of specialized metabolic pathways underlying specific chemical traits in nonmodel eukaryotic organisms has been technically challenging and historically lagged behind that of the bacterial systems. Recent advances in genomics, metabolomics, phylogenomics, and synthetic biology now enable a new workflow for interrogating unknown specialized metabolic systems in nonmodel eukaryotic hosts with greater efficiency and mechanistic depth. This chapter delineates such workflow by providing a collection of state-of-the-art approaches and tools, ranging from multiomics-guided candidate gene identification to in vitro and in vivo functional and structural characterization of specialized metabolic enzymes. As already demonstrated by several recent studies, this new workflow opens up a gateway into the largely untapped world of natural product biochemistry in eukaryotes. © 2016 Elsevier Inc. All rights reserved.


September 22, 2019

Single-Molecule Long-Read Sequencing of Zanthoxylum bungeanum Maxim. Transcriptome: Identification of Aroma-Related Genes

Zanthoxylum bungeanum Maxim. is an economically important tree species that is resistant to drought and infertility, and has potential medicinal and edible value. However, comprehensive genomic data are not yet available for this species, limiting its potential utility for medicinal use, breeding programs, and cultivation. Transcriptome sequencing provides an effective approach to remedying this shortcoming. Herein, single-molecule long-read sequencing and next-generation sequencingapproacheswereusedinparalleltoobtaintranscriptisoformstructureandgenefunctional informationinZ.bungeanum. Intotal, 282,101readsofinserts(ROIs)wereidentified, including134,074 full-length non-chimeric reads, among which 65,711 open reading frames (ORFs), 50,135 simple sequence repeats (SSRs), and 1492 long non-coding RNAs (lncRNAs) were detected. Functional annotation revealed metabolic pathways related to aroma components and color characteristics in Z. bungeanum. Unexpectedly, 30 transcripts were annotated as genes involved in regulating the pathogenesis of breast and colorectal cancers. This work provides a comprehensive transcriptome resource for Z. bungeanum, and lays a foundation for the further investigation and utilization of Zanthoxylum resources.


September 22, 2019

Extensive alternative splicing of KIR transcripts.

The killer-cell Ig-like receptors (KIR) form a multigene entity involved in modulating immune responses through interactions with MHC class I molecules. The complexity of the KIR cluster is reflected by, for instance, abundant levels of allelic polymorphism, gene copy number variation, and stochastic expression profiles. The current transcriptome study involving human and macaque families demonstrates that KIR family members are also subjected to differential levels of alternative splicing, and this seems to be gene dependent. Alternative splicing may result in the partial or complete skipping of exons, or the partial inclusion of introns, as documented at the transcription level. This post-transcriptional process can generate multiple isoforms from a single KIR gene, which diversifies the characteristics of the encoded proteins. For example, alternative splicing could modify ligand interactions, cellular localization, signaling properties, and the number of extracellular domains of the receptor. In humans, we observed abundant splicing for KIR2DL4, and to a lesser extent in the lineage III KIR genes. All experimentally documented splice events are substantiated by in silico splicing strength predictions. To a similar extent, alternative splicing is observed in rhesus macaques, a species that shares a close evolutionary relationship with humans. Splicing profiles of Mamu-KIR1D and Mamu-KIR2DL04 displayed a great diversity, whereas Mamu-KIR3DL20 (lineage V) is consistently spliced to generate a homolog of human KIR2DL5 (lineage I). The latter case represents an example of convergent evolution. Although just a single KIR splice event is shared between humans and macaques, the splicing mechanisms are similar, and the predicted consequences are comparable. In conclusion, alternative splicing adds an additional layer of complexity to the KIR gene system in primates, and results in a wide structural and functional variety of KIR receptors and its isoforms, which may play a role in health and disease.


September 22, 2019

The habu genome reveals accelerated evolution of venom protein genes.

Evolution of novel traits is a challenging subject in biological research. Several snake lineages developed elaborate venom systems to deliver complex protein mixtures for prey capture. To understand mechanisms involved in snake venom evolution, we decoded here the ~1.4-Gb genome of a habu, Protobothrops flavoviridis. We identified 60 snake venom protein genes (SV) and 224 non-venom paralogs (NV), belonging to 18 gene families. Molecular phylogeny reveals early divergence of SV and NV genes, suggesting that one of the four copies generated through two rounds of whole-genome duplication was modified for use as a toxin. Among them, both SV and NV genes in four major components were extensively duplicated after their diversification, but accelerated evolution is evident exclusively in the SV genes. Both venom-related SV and NV genes are significantly enriched in microchromosomes. The present study thus provides a genetic background for evolution of snake venom composition.


September 22, 2019

Gut microbiota, nitric oxide, and microglia as prerequisites for neurodegenerative disorders.

Regulating fluctuating endogenous nitric oxide (NO) levels is necessary for proper physiological functions. Aberrant NO pathways are implicated in a number of neurological disorders, including Alzheimer’s disease (AD) and Parkinson’s disease. The mechanism of NO in oxidative and nitrosative stress with pathological consequences involves reactions with reactive oxygen species (e.g., superoxide) to form the highly reactive peroxynitrite, hydrogen peroxide, hypochloride ions and hydroxyl radical. NO levels are typically regulated by endogenous nitric oxide synthases (NOS), and inflammatory iNOS is implicated in the pathogenesis of neurodegenerative diseases, in which elevated NO mediates axonal degeneration and activates cyclooxygenases to provoke neuroinflammation. NO also instigates a down-regulated secretion of brain-derived neurotrophic factor, which is essential for neuronal survival, development and differentiation, synaptogenesis, and learning and memory. The gut-brain axis denotes communication between the enteric nervous system (ENS) of the GI tract and the central nervous system (CNS) of the brain, and the modes of communication include the vagus nerve, passive diffusion and carrier by oxyhemoglobin. Amyloid precursor protein that forms amyloid beta plaques in AD is normally expressed in the ENS by gut bacteria, but when amyloid beta accumulates, it compromises CNS functions. Escherichia coli and Salmonella enterica are among the many bacterial strains that express and secrete amyloid proteins and contribute to AD pathogenesis. Gut microbiota is essential for regulating microglia maturation and activation, and activated microglia secrete significant amounts of iNOS. Pharmacological interventions and lifestyle modifications to rectify aberrant NO signaling in AD include NOS inhibitors, NMDA receptor antagonists, potassium channel modulators, probiotics, diet, and exercise.


September 22, 2019

Lipoprotein lipase reaches the capillary lumen in chickens despite an apparent absence of GPIHBP1.

In mammals, GPIHBP1 is absolutely essential for transporting lipoprotein lipase (LPL) to the lumen of capillaries, where it hydrolyzes the triglycerides in triglyceride-rich lipoproteins. In all lower vertebrate species (e.g., birds, amphibians, reptiles, fish), a gene for LPL can be found easily, but a gene for GPIHBP1 has never been found. The obvious question is whether the LPL in lower vertebrates is able to reach the capillary lumen. Using purified antibodies against chicken LPL, we showed that LPL is present on capillary endothelial cells of chicken heart and adipose tissue, colocalizing with von Willebrand factor. When the antibodies against chicken LPL were injected intravenously into chickens, they bound to LPL on the luminal surface of capillaries in heart and adipose tissue. LPL was released rapidly from chicken hearts with an infusion of heparin, consistent with LPL being located inside blood vessels. Remarkably, chicken LPL bound in a specific fashion to mammalian GPIHBP1. However, we could not identify a gene for GPIHBP1 in the chicken genome, nor could we identify a transcript for GPIHBP1 in a large chicken RNA-seq data set. We conclude that LPL reaches the capillary lumen in chickens – as it does in mammals – despite an apparent absence of GPIHBP1.


September 22, 2019

Cataloguing over-expressed genes in Epstein Barr Virus immortalized lymphoblastoid cell lines through consensus analysis of PacBio transcriptomes corroborates hypomethylation of chromosome 1

The ability of Epstein Barr Virus (EBV) to transform resting cell B-cells into immortalized lymphoblastoid cell lines (LCL) provides a continuous source of peripheral blood lymphocytes that are used to model conditions in which these lymphocytes play a key role. Here, the PacBio generated transcriptome of three LCLs from a parent-daughter trio (SRAid:SRP036136) provided by a previous study [1] were analyzed using a kmer-based version of YeATS (KEATS). The set of over-expressed genes in these cell lines were determined based on a comparison with the PacBio transcriptome of twenty tissues pro- vided by another study (hOPTRS) [2]. MIR155 long non-coding RNA (MIR155HG), Fc fragment of IgE receptor II (FCER2), T-cell leukemia/lymphoma 1A (TCL1A), and germinal center associated signaling and motility (GCSAM) were genes having the highest expression counts in the three LCLs with no expression in hOPTRS. Other over-expressed genes, having low expression in hOPTRS, were membrane spanning 4-domains A1 (MS4A1) and ribosomal protein S2 pseudogene 55 (RPS2P55). While some of these genes are known to be over-expressed in LCLs, this study provides a comprehensive cataloguing of such genes. A recent work involving a patient with EBV-positive large B-cell lymphoma was “unusually lacking various B-cell markers”, but over-expressing CD30 [3] – a gene ranked 79 among uniquely expressed genes here. Hypomethylation of chromosome 1 observed in EBV immortalized LCLs [4, 5] is also corroborated here by mapping the genes to chromosomes. Extending previous work identifying un-annotated genes [6], 80 genes were identified which are expressed in the three LCLs, not in hOPTRS, and missing in the GENCODE, RefSeq and RefSeqGene databases. KEATS introduces a method of determining expression counts based on a partitioning of the known annotated genes, has runtimes of a few hours on a personal workstation and provides detailed reports enabling proper debugging.


September 22, 2019

Comprehensive genomic analysis of malignant pleural mesothelioma identifies recurrent mutations, gene fusions and splicing alterations.

We analyzed transcriptomes (n = 211), whole exomes (n = 99) and targeted exomes (n = 103) from 216 malignant pleural mesothelioma (MPM) tumors. Using RNA-seq data, we identified four distinct molecular subtypes: sarcomatoid, epithelioid, biphasic-epithelioid (biphasic-E) and biphasic-sarcomatoid (biphasic-S). Through exome analysis, we found BAP1, NF2, TP53, SETD2, DDX3X, ULK2, RYR2, CFAP45, SETDB1 and DDX51 to be significantly mutated (q-score = 0.8) in MPMs. We identified recurrent mutations in several genes, including SF3B1 (~2%; 4/216) and TRAF7 (~2%; 5/216). SF3B1-mutant samples showed a splicing profile distinct from that of wild-type tumors. TRAF7 alterations occurred primarily in the WD40 domain and were, except in one case, mutually exclusive with NF2 alterations. We found recurrent gene fusions and splice alterations to be frequent mechanisms for inactivation of NF2, BAP1 and SETD2. Through integrated analyses, we identified alterations in Hippo, mTOR, histone methylation, RNA helicase and p53 signaling pathways in MPMs.


September 22, 2019

Composition and pathogenic potential of a microbial bioremediation product used for crude oil degradation.

A microbial bioremediation product (MBP) used for large-scale oil degradation was investigated for microbial constituents and possible pathogenicity. Aerobic growth on various media yielded >108 colonies mL-1. Full-length 16S rDNA sequencing and fatty acid profiling from morphologically distinct colonies revealed =13 distinct genera. Full-length 16S rDNA library sequencing, by either Sanger or long-read PacBio technology, suggested that up to 21% of the MBP was composed of Arcobacter. Other high abundance microbial constituents (>6%) included the genera Proteus, Enterococcus, Dysgonomonas and several genera in the order Bacteroidales. The MBP was most susceptible to ciprofloxacin, doxycycline, gentamicin, and meropenam. MBP exposure of human HT29 and A549 cells caused significant cytotoxicity, and bacterial growth and adherence. An acellular MBP filtrate was also cytotoxic to HT29, but not A549. Both MBP and filtrate exposures elevated the neutrophil chemoattractant IL-8. In endotracheal murine exposures, bacterial pulmonary clearance was complete after one-week. Elevation of pro-inflammatory cytokines IL-1ß, IL-6, and TNF-a, and chemokines KC and MCP-1 occurred between 2h and 48h post-exposure, followed by restoration to baseline levels at 96h. Cytokine/chemokine signalling was accompanied by elevated blood neutrophils and monocytes at 4h and 48h, respectively. Peripheral acute phase response markers were maximal at 24h. All indicators examined returned to baseline values by 168h. In contrast to HT29, but similar to A549 observations, MBP filtrate did not induce significant murine effects with the indicators examined. The results demonstrated the potentially complex nature of MBPs and transient immunological effects during exposure. Products containing microbes should be scrutinized for pathogenic components and subjected to characterisation and quality validation prior to commercial release.


September 22, 2019

Identification of microbial profile of Koji using Single Molecule, Real-Time Sequencing technology.

Koji is a kind of Japanese traditional fermented starter that has been used for centuries. Many fermented foods are made from koji, such as sake, miso, and soy sauce. This study used the single molecule real-time sequencing technology (SMRT) to investigate the bacterial and fungal microbiota of 3 Japanese koji samples. After SMRT analysis, a total of 39121 high-quality sequences were generated, including 14354 bacterial and 24767 fungal sequence reads. The high-quality gene sequences were assigned to 5 bacterial and 2 fungal plyla, dominated by Proteobacteria and Ascomycota, respectively. At the genus level, Ochrobactrum and Wickerhamomyces were the most abundant bacterial and fungal genera, respectively. The predominant bacterial and fungal species were Ochrobactrum lupini and Wickerhamomyces anomalus, respectively. Our study profiled the microbiota composition of 3 Japanese koji samples to the species level precision. The results may be useful for further development of traditional fermented products, especially optimization of koji preparation. Meanwhile, this study has demonstrated that SMRT is a robust tool for analyzing the microbial composition in food samples.© 2017 Institute of Food Technologists®.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.