Menu
September 22, 2019

Defining cell identity with single cell omics.

Cells are a fundamental unit of life, and the ability to study the phenotypes and behaviors of individual cells is crucial to understanding the workings of complex biological systems. Cell phenotypes (epigenomic, transcriptomic, proteomic, and metabolomic) exhibit dramatic heterogeneity between and within the different cell types and states underlying cellular functional diversity. Cell genotypes can also display heterogeneity throughout an organism, in the form of somatic genetic variation-most notably in the emergence and evolution of tumors. Recent technical advances in single-cell isolation and the development of omics approaches sensitive enough to reveal these aspects of cell identity have enabled a revolution in the study of multicellular systems. In this review, we discuss the technologies available to resolve the genomes, epigenomes, transcriptomes, proteomes, and metabolomes of single cells from a wide variety of living systems.© 2018 The Authors. Proteomics Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


September 22, 2019

Identification of differentially expressed splice variants by the proteogenomic pipeline Splicify.

Proteogenomics, i.e. comprehensive integration of genomics and proteomics data, is a powerful approach identifying novel protein biomarkers. This is especially the case for proteins that differ structurally between disease and control conditions. As tumor development is associated with aberrant splicing, we focus on this rich source of cancer specific biomarkers. To this end, we developed a proteogenomic pipeline, Splicify, which is able to detect differentially expressed protein isoforms. Splicify is based on integrating RNA massive parallel sequencing data and tandem mass spectrometry proteomics data to identify protein isoforms resulting from differential splicing between two conditions. Proof of concept was obtained by applying Splicify to RNA sequencing and mass spectrometry data obtained from colorectal cancer cell line SW480, before and after siRNA-mediated down-modulation of the splicing factors SF3B1 and SRSF1. These analyses revealed 2172 and 149 differentially expressed isoforms, respectively, with peptide confirmation upon knock-down of SF3B1 and SRSF1 compared to their controls. Splice variants identified included RAC1, OSBPL3, MKI67 and SYK. One additional sample was analyzed by PacBio Iso-Seq full-length transcript sequencing after SF3B1 down-modulation. This analysis verified the alternative splicing identified by Splicify and in addition identified novel splicing events that were not represented in the human reference genome annotation. Therefore, Splicify offers a validated proteogenomic data analysis pipeline for identification of disease specific protein biomarkers resulting from mRNA alternative splicing. Splicify is publicly available on GitHub (https://github.com/NKI-TGO/SPLICIFY) and suitable to address basic research questions using pre-clinical model systems as well as translational research questions using patient-derived samples, e.g. allowing to identify clinically relevant biomarkers. Copyright © 2017, The American Society for Biochemistry and Molecular Biology.


September 22, 2019

Assessing the gene content of the megagenome: sugar pine (Pinus lambertiana).

Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq has been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here contribute to the otherwise scarce comparisons of 2nd and 3rd generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data was also used to address some of the questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. Copyright © 2016 Author et al.


September 22, 2019

A protein-truncating HSD17B13 variant and protection from chronic liver disease.

Elucidation of the genetic factors underlying chronic liver disease may reveal new therapeutic targets.We used exome sequence data and electronic health records from 46,544 participants in the DiscovEHR human genetics study to identify genetic variants associated with serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST). Variants that were replicated in three additional cohorts (12,527 persons) were evaluated for association with clinical diagnoses of chronic liver disease in DiscovEHR study participants and two independent cohorts (total of 37,173 persons) and with histopathological severity of liver disease in 2391 human liver samples.A splice variant (rs72613567:TA) in HSD17B13, encoding the hepatic lipid droplet protein hydroxysteroid 17-beta dehydrogenase 13, was associated with reduced levels of ALT (P=4.2×10-12) and AST (P=6.2×10-10). Among DiscovEHR study participants, this variant was associated with a reduced risk of alcoholic liver disease (by 42% [95% confidence interval CI, 20 to 58] among heterozygotes and by 53% [95% CI, 3 to 77] among homozygotes), nonalcoholic liver disease (by 17% [95% CI, 8 to 25] among heterozygotes and by 30% [95% CI, 13 to 43] among homozygotes), alcoholic cirrhosis (by 42% [95% CI, 14 to 61] among heterozygotes and by 73% [95% CI, 15 to 91] among homozygotes), and nonalcoholic cirrhosis (by 26% [95% CI, 7 to 40] among heterozygotes and by 49% [95% CI, 15 to 69] among homozygotes). Associations were confirmed in two independent cohorts. The rs72613567:TA variant was associated with a reduced risk of nonalcoholic steatohepatitis, but not steatosis, in human liver samples. The rs72613567:TA variant mitigated liver injury associated with the risk-increasing PNPLA3 p.I148M allele and resulted in an unstable and truncated protein with reduced enzymatic activity.A loss-of-function variant in HSD17B13 was associated with a reduced risk of chronic liver disease and of progression from steatosis to steatohepatitis. (Funded by Regeneron Pharmaceuticals and others.).


September 22, 2019

Full-length transcriptome sequencing and modular organization analysis of naringin/neoeriocitrin related gene expression pattern in Drynaria roosii.

Drynaria roosii (Nakaike) is a traditional Chinese medicinal fern, known as ‘GuSuiBu’. The effective components, naringin and neoeriocitrin, share a highly similar chemical structure and medicinal function. Our HPLC-tandem mass spectrometry (MS/MS) results showed that the accumulation of naringin/neoeriocitrin depended on specific tissues or ages. However, little was known about the expression patterns of naringin/neoeriocitrin-related genes involved in their regulatory pathways. Due to a lack of basic genetic information, we applied a combination of single molecule real-time (SMRT) sequencing and second-generation sequencing (SGS) to generate the complete and full-length transcriptome of D. roosii. According to the SGS data, the differentially expressed gene (DEG)-based heat map analysis revealed that naringin/neoeriocitrin-related gene expression exhibited obvious tissue- and time-specific transcriptomic differences. Using the systems biology method of modular organization analysis, we clustered 16,472 DEGs into 17 gene modules and studied the relationships between modules and tissue/time point samples, as well as modules and naringin/neoeriocitrin contents. We found that naringin/neoeriocitrin-related DEGs distributed in nine distinct modules, and DEGs in these modules showed significantly different patterns of transcript abundance to be linked to specific tissues or ages. Moreover, weighted gene co-expression network analysis (WGCNA) results further identified that PAL, 4CL and C4H, and C3H and HCT acted as the major hub genes involved in naringin and neoeriocitrin synthesis, respectively, and exhibited high co-expression with MYB- and basic helix-leucine-helix (bHLH)-regulated genes. In this work, modular organization and co-expression networks elucidated the tissue and time specificity of the gene expression pattern, as well as hub genes associated with naringin/neoeriocitrin synthesis in D. roosii. Simultaneously, the comprehensive transcriptome data set provided important genetic information for further research on D. roosii.


September 22, 2019

Shift in fungal communities and associated enzyme activities along an age gradient of managed Pinus sylvestris stands.

Forestry reshapes ecosystems with respect to tree age structure, soil properties and vegetation composition. These changes are likely to be paralleled by shifts in microbial community composition with potential feedbacks on ecosystem functioning. Here, we assessed fungal communities across a chronosequence of managed Pinus sylvestris stands and investigated correlations between taxonomic composition and extracellular enzyme activities. Not surprisingly, clear-cutting had a negative effect on ectomycorrhizal fungal abundance and diversity. In contrast, clear-cutting favoured proliferation of saprotrophic fungi correlated with enzymes involved in holocellulose decomposition. During stand development, the re-establishing ectomycorrhizal fungal community shifted in composition from dominance by Atheliaceae in younger stands to Cortinarius and Russula species in older stands. Late successional ectomycorrhizal taxa correlated with enzymes involved in mobilisation of nutrients from organic matter, indicating intensified nutrient limitation. Our results suggest that maintenance of functional diversity in the ectomycorrhizal fungal community may sustain long-term forest production by retaining a capacity for symbiosis-driven recycling of organic nutrient pools.


September 22, 2019

Somatic APP gene recombination in Alzheimer’s disease and normal neurons.

The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro-insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and diseases of the human brain.


September 22, 2019

Multiscale patterns and drivers of arbuscular mycorrhizal fungal communities in the roots and root-associated soil of a wild perennial herb.

Arbuscular mycorrhizal (AM) fungi form diverse communities and are known to influence above-ground community dynamics and biodiversity. However, the multiscale patterns and drivers of AM fungal composition and diversity are still poorly understood. We sequenced DNA markers from roots and root-associated soil from Plantago lanceolata plants collected across multiple spatial scales to allow comparison of AM fungal communities among neighbouring plants, plant subpopulations, nearby plant populations, and regions. We also measured soil nutrients, temperature, humidity, and community composition of neighbouring plants and nonAM root-associated fungi. AM fungal communities were already highly dissimilar among neighbouring plants (c. 30 cm apart), albeit with a high variation in the degree of similarity at this small spatial scale. AM fungal communities were increasingly, and more consistently, dissimilar at larger spatial scales. Spatial structure and environmental drivers explained a similar percentage of the variation, from 7% to 25%. A large fraction of the variation remained unexplained, which may be a result of unmeasured environmental variables, species interactions and stochastic processes. We conclude that AM fungal communities are highly variable among nearby plants. AM fungi may therefore play a major role in maintaining small-scale variation in community dynamics and biodiversity.© 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.


September 22, 2019

Next generation sequencing data of a defined microbial mock community.

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.


September 22, 2019

Single-cell mRNA isoform diversity in the mouse brain.

Alternative mRNA isoform usage is an important source of protein diversity in mammalian cells. This phenomenon has been extensively studied in bulk tissues, however, it remains unclear how this diversity is reflected in single cells.Here we use long-read sequencing technology combined with unique molecular identifiers (UMIs) to reveal patterns of alternative full-length isoform expression in single cells from the mouse brain. We found a surprising amount of isoform diversity, even after applying a conservative definition of what constitutes an isoform. Genes tend to have one or a few isoforms highly expressed and a larger number of isoforms expressed at a low level. However, for many genes, nearly every sequenced mRNA molecule was unique, and many events affected coding regions suggesting previously unknown protein diversity in single cells. Exon junctions in coding regions were less prone to splicing errors than those in non-coding regions, indicating purifying selection on splice donor and acceptor efficiency.Our findings indicate that mRNA isoform diversity is an important source of biological variability also in single cells.


September 22, 2019

Transcriptome analysis of distinct cold tolerance strategies in the rubber tree (Hevea brasiliensis)

Natural rubber is an indispensable commodity used in approximately 40,000 products and is fundamental to the tire industry. Among the species that produce latex, the rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss.) Muell-Arg.], a species native to the Amazon rainforest, is the major producer of latex used worldwide. The Amazon Basin presents optimal conditions for rubber tree growth, but the occurrence of South American leaf blight, which is caused by the fungus Microcyclus ulei (P. Henn) v. Arx, limits rubber tree production. Currently, rubber tree plantations are located in scape regions that exhibit suboptimal conditions such as high winds and cold temperatures. Rubber tree breeding programs aim to identify clones that are adapted to these stress conditions. However, rubber tree breeding is time-consuming, taking more than 20 years to develop a new variety. It is also expensive and requires large field areas. Thus, genetic studies could optimize field evaluations, thereby reducing the time and area required for these experiments. Transcriptome sequencing using next-generation sequencing (RNA-seq) is a powerful tool to identify a full set of transcripts and for evaluating gene expression in model and non-model species. In this study, we constructed a comprehensive transcriptome to evaluate the cold response strategies of the RRIM600 (cold-resistant) and GT1 (cold-tolerant) genotypes. Furthermore, we identified putative microsatellite (SSR) and single-nucleotide polymorphism (SNP) markers. Alternative splicing, which is an important mechanism for plant adaptation under abiotic stress, was further identified, providing an important database for further studies of cold tolerance.


September 22, 2019

The first whole transcriptomic exploration of pre-oviposited early chicken embryos using single and bulked embryonic RNA-sequencing.

The chicken is a valuable model organism, especially in evolutionary and embryology research because its embryonic development occurs in the egg. However, despite its scientific importance, no transcriptome data have been generated for deciphering the early developmental stages of the chicken because of practical and technical constraints in accessing pre-oviposited embryos.Here, we determine the entire transcriptome of pre-oviposited avian embryos, including oocyte, zygote, and intrauterine embryos from Eyal-giladi and Kochav stage I (EGK.I) to EGK.X collected using a noninvasive approach for the first time. We also compare RNA-sequencing data obtained using a bulked embryo sequencing and single embryo/cell sequencing technique. The raw sequencing data were preprocessed with two genome builds, Galgal4 and Galgal5, and the expression of 17,108 and 26,102 genes was quantified in the respective builds. There were some differences between the two techniques, as well as between the two genome builds, and these were affected by the emergence of long intergenic noncoding RNA annotations.The first transcriptome datasets of pre-oviposited early chicken embryos based on bulked and single embryo sequencing techniques will serve as a valuable resource for investigating early avian embryogenesis, for comparative studies among vertebrates, and for novel gene annotation in the chicken genome.


September 22, 2019

Circular RNA architecture and differentiation during leaf bud to young leaf development in tea (Camellia sinensis).

Circular RNA (circRNA) discovery, expression patterns and experimental validation in developing tea leaves indicates its correlation with circRNA-parental genes and potential roles in ceRNA interaction network. Circular RNAs (circRNAs) have recently emerged as a novel class of abundant endogenous stable RNAs produced by circularization with regulatory potential. However, identification of circRNAs in plants, especially in non-model plants with large genomes, is challenging. In this study, we undertook a systematic identification of circRNAs from different stage tissues of tea plant (Camellia sinensis) leaf development using rRNA-depleted circular RNA-seq. By combining two state-of-the-art detecting tools, we characterized 3174 circRNAs, of which 342 were shared by each approach, and thus considered high-confidence circRNAs. A few predicted circRNAs were randomly chosen, and 20 out of 24 were experimental confirmed by PCR and Sanger sequencing. Similar in other plants, tissue-specific expression was also observed for many C. sinensis circRNAs. In addition, we found that circRNA abundances were positively correlated with the mRNA transcript abundances of their parental genes. qRT-PCR validated the differential expression patterns of circRNAs between leaf bud and young leaf, which also indicated the low expression abundance of circRNAs compared to the standard mRNAs from the parental genes. We predicted the circRNA-microRNA interaction networks, and 54 of the differentially expressed circRNAs were found to have potential tea plant miRNA binding sites. The gene sets encoding circRNAs were significantly enriched in chloroplasts related GO terms and photosynthesis/metabolites biosynthesis related KEGG pathways, suggesting the candidate roles of circRNAs in photosynthetic machinery and metabolites biosynthesis during leaf development.


September 22, 2019

Quantitative metaproteomics highlight the metabolic contributions of uncultured phylotypes in a thermophilic anaerobic digester.

In this study, we used multiple meta-omic approaches to characterize the microbial community and the active metabolic pathways of a stable industrial biogas reactor with food waste as the dominant feedstock, operating at thermophilic temperatures (60°C) and elevated levels of free ammonia (367 mg/liter NH3-N). The microbial community was strongly dominated (76% of all 16S rRNA amplicon sequences) by populations closely related to the proteolytic bacterium Coprothermobacter proteolyticus. Multiple Coprothermobacter-affiliated strains were detected, introducing an additional level of complexity seldom explored in biogas studies. Genome reconstructions provided metabolic insight into the microbes that performed biomass deconstruction and fermentation, including the deeply branching phyla Dictyoglomi and Planctomycetes and the candidate phylum “Atribacteria” These biomass degraders were complemented by a synergistic network of microorganisms that convert key fermentation intermediates (fatty acids) via syntrophic interactions with hydrogenotrophic methanogens to ultimately produce methane. Interpretation of the proteomics data also suggested activity of a Methanosaeta phylotype acclimatized to high ammonia levels. In particular, we report multiple novel phylotypes proposed as syntrophic acetate oxidizers, which also exert expression of enzymes needed for both the Wood-Ljungdahl pathway and ß-oxidation of fatty acids to acetyl coenzyme A. Such an arrangement differs from known syntrophic oxidizing bacteria and presents an interesting hypothesis for future studies. Collectively, these findings provide increased insight into active metabolic roles of uncultured phylotypes and presents new synergistic relationships, both of which may contribute to the stability of the biogas reactor.Biogas production through anaerobic digestion of organic waste provides an attractive source of renewable energy and a sustainable waste management strategy. A comprehensive understanding of the microbial community that drives anaerobic digesters is essential to ensure stable and efficient energy production. Here, we characterize the intricate microbial networks and metabolic pathways in a thermophilic biogas reactor. We discuss the impact of frequently encountered microbial populations as well as the metabolism of newly discovered novel phylotypes that seem to play distinct roles within key microbial stages of anaerobic digestion in this stable high-temperature system. In particular, we draft a metabolic scenario whereby multiple uncultured syntrophic acetate-oxidizing bacteria are capable of syntrophically oxidizing acetate as well as longer-chain fatty acids (via the ß-oxidation and Wood-Ljundahl pathways) to hydrogen and carbon dioxide, which methanogens subsequently convert to methane. Copyright © 2016 American Society for Microbiology.


September 22, 2019

Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome.

Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested.Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes.MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.