Genome editing has proven to be highly potent in the generation of functional gene knockouts in dividing cells. In the CNS however, efficient technologies to repair sequences are yet to materialize. Reprogramming on the mRNA level is an attractive alternative as it provides means to perform in situ editing of coding sequences without nuclease dependency. Furthermore, de novo sequences can be inserted without the requirement of homologous recombination. Such reprogramming would enable efficient editing in quiescent cells (e.g., neurons) with an attractive safety profile for translational therapies. In this study, we applied a novel molecular-barcoded screening assay to investigate RNA trans-splicing in mammalian neurons. Through three alternative screening systems in cell culture and in vivo, we demonstrate that factors determining trans-splicing are reproducible regardless of the screening system. With this screening, we have located the most permissive trans-splicing sequences targeting an intron in the Synapsin I gene. Using viral vectors, we were able to splice full-length fluorophores into the mRNA while retaining very low off-target expression. Furthermore, this approach also showed evidence of functionality in the mouse striatum. However, in its current form, the trans-splicing events are stochastic and the overall activity lower than would be required for therapies targeting loss-of-function mutations. Nevertheless, the herein described barcode-based screening assay provides a unique possibility to screen and map large libraries in single animals or cell assays with very high precision.© 2018 Davidsson et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Adeno-associated virus type 2 wild-type and vector-mediated genomic integration profiles of human diploid fibroblasts analyzed by third-generation PacBio DNA sequencing.
Genome-wide analysis of adeno-associated virus (AAV) type 2 integration in HeLa cells has shown that wild-type AAV integrates at numerous genomic sites, including AAVS1 on chromosome 19q13.42. Multiple GAGY/C repeats, resembling consensus AAV Rep-binding sites are preferred, whereas rep-deficient AAV vectors (rAAV) regularly show a random integration profile. This study is the first study to analyze wild-type AAV integration in diploid human fibroblasts. Applying high-throughput third-generation PacBio-based DNA sequencing, integration profiles of wild-type AAV and rAAV are compared side by side. Bioinformatic analysis reveals that both wild-type AAV and rAAV prefer open chromatin regions. Although genomic features of AAV integration largely reproduce previous findings, the pattern of integration hot spots differs from that described in HeLa cells before. DNase-Seq data for human fibroblasts and for HeLa cells reveal variant chromatin accessibility at preferred AAV integration hot spots that correlates with variant hot spot preferences. DNase-Seq patterns of these sites in human tissues, including liver, muscle, heart, brain, skin, and embryonic stem cells further underline variant chromatin accessibility. In summary, AAV integration is dependent on cell-type-specific, variant chromatin accessibility leading to random integration profiles for rAAV, whereas wild-type AAV integration sites cluster near GAGY/C repeats.Adeno-associated virus type 2 (AAV) is assumed to establish latency by chromosomal integration of its DNA. This is the first genome-wide analysis of wild-type AAV2 integration in diploid human cells and the first to compare wild-type to recombinant AAV vector integration side by side under identical experimental conditions. Major determinants of wild-type AAV integration represent open chromatin regions with accessible consensus AAV Rep-binding sites. The variant chromatin accessibility of different human tissues or cell types will have impact on vector targeting to be considered during gene therapy. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Bioengineered AAV capsids with combined high human liver transduction in vivo and unique humoral seroreactivity.
Existing recombinant adeno-associated virus (rAAV) serotypes for delivering in vivo gene therapy treatments for human liver diseases have not yielded combined high-level human hepatocyte transduction and favorable humoral neutralization properties in diverse patient groups. Yet, these combined properties are important for therapeutic efficacy. To bioengineer capsids that exhibit both unique seroreactivity profiles and functionally transduce human hepatocytes at therapeutically relevant levels, we performed multiplexed sequential directed evolution screens using diverse capsid libraries in both primary human hepatocytes in vivo and with pooled human sera from thousands of patients. AAV libraries were subjected to five rounds of in vivo selection in xenografted mice with human livers to isolate an enriched human-hepatotropic library that was then used as input for a sequential on-bead screen against pooled human immunoglobulins. Evolved variants were vectorized and validated against existing hepatotropic serotypes. Two of the evolved AAV serotypes, NP40 and NP59, exhibited dramatically improved functional human hepatocyte transduction in vivo in xenografted mice with human livers, along with favorable human seroreactivity profiles, compared with existing serotypes. These novel capsids represent enhanced vector delivery systems for future human liver gene therapy applications. Copyright © 2017. Published by Elsevier Inc.
It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer’s neighborhood.Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with “neighbor” modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.https://github.com/alibashir/EMMCKmer.
Long-read sequencing uncovers transcript features missed by short-read methods.
Cultivated bacteria such as actinomycetes are a highly useful source of biomedically important natural products. However, such ‘talented’ producers represent only a minute fraction of the entire, mostly uncultivated, prokaryotic diversity. The uncultured majority is generally perceived as a large, untapped resource of new drug candidates, but so far it is unknown whether taxa containing talented bacteria indeed exist. Here we report the single-cell- and metagenomics-based discovery of such producers. Two phylotypes of the candidate genus ‘Entotheonella’ with genomes of greater than 9 megabases and multiple, distinct biosynthetic gene clusters co-inhabit the chemically and microbially rich marine sponge Theonella swinhoei. Almost all bioactive polyketides and peptides known from this animal were attributed to a single phylotype. ‘Entotheonella’ spp. are widely distributed in sponges and belong to an environmental taxon proposed here as candidate phylum ‘Tectomicrobia’. The pronounced bioactivities and chemical uniqueness of ‘Entotheonella’ compounds provide significant opportunities for ecological studies and drug discovery.
Transcriptional adaptations during long-term persistence of Staphylococcus aureus in the airways of a cystic fibrosis patient.
The lungs of Cystic fibrosis (CF) patients are often colonized and/or infected by Staphylococcus aureus for years, mostly by one predominant clone. For long-term survival in this environment, S. aureus needs to adapt during its interactions with host factors, antibiotics, and other pathogens. Here, we study long-term transcriptional as well as genomic adaptations of an isogenic pair of S. aureus isolates from a single patient using RNA sequencing (RNA-Seq) and whole genome sequencing (WGS). Mimicking in vivo conditions, we cultivated the S. aureus isolates using artificial sputum medium before harvesting RNA for subsequent analysis. We confirmed our RNA-Seq data using quantitative real-time (qRT)-PCR and additionally investigated intermediate isolates from the same patient representing in total 13.2 years of persistence in the CF airways. Comparative RNA-Seq analysis of the first and the last (“late”) isolate revealed significant differences in the late isolate after 13.2 years of persistence. Of the 2545 genes expressed in both isolates that were cultivated aerobically, 256 genes were up- and 161 were down-regulated with a minimum 2-fold change (2f). Focusing on 25 highly (=8f) up- (n=9) or down- (n=16) regulated genes, we identified several genes encoding for virulence factors involved in immune evasion, bacterial spread or secretion (e.g. spa, sak, and esxA). Moreover, these genes displayed similar expression trends under aerobic, microaerophilic and anaerobic conditions. Further qRT-PCR-experiments of highly up- or down-regulated genes within intermediate S. aureus isolates resulted in different gene expression patterns over the years. Using sequencing analysis of the differently expressed genes and their upstream regions in the late S. aureus isolate resulted in only few genomic alterations. Comparative transcriptomic analysis revealed adaptive changes affecting mainly genes involved in host-pathogen interaction. Although the underlying mechanisms were not known, our results suggest adaptive processes beyond genomic mutations triggered by local factors rather than by activation of global regulators. Copyright © 2014 The Authors. Published by Elsevier GmbH.. All rights reserved.
Biogas reactors operating with protein-rich substrates have high methane potential and industrial value; however, they are highly susceptible to process failure because of the accumulation of ammonia. High ammonia levels cause a decline in acetate-utilizing methanogens and instead promote the conversion of acetate via a two-step mechanism involving syntrophic acetate oxidation (SAO) to H2 and CO2, followed by hydrogenotrophic methanogenesis. Despite the key role of syntrophic acetate-oxidizing bacteria (SAOB), only a few culturable representatives have been characterized. Here we show that the microbiome of a commercial, ammonia-tolerant biogas reactor harbors a deeply branched, uncultured phylotype (unFirm_1) accounting for approximately 5% of the 16S rRNA gene inventory and sharing 88% 16S rRNA gene identity with its closest characterized relative. Reconstructed genome and quantitative metaproteomic analyses imply unFirm_1’s metabolic dominance and SAO capabilities, whereby the key enzymes required for acetate oxidation are among the most highly detected in the reactor microbiome. While culturable SAOB were identified in genomic analyses of the reactor, their limited proteomic representation suggests that unFirm_1 plays an important role in channeling acetate toward methane. Notably, unFirm_1-like populations were found in other high-ammonia biogas installations, conjecturing a broader importance for this novel clade of SAOB in anaerobic fermentations. IMPORTANCE The microbial production of methane or “biogas” is an attractive renewable energy technology that can recycle organic waste into biofuel. Biogas reactors operating with protein-rich substrates such as household municipal or agricultural wastes have significant industrial and societal value; however, they are highly unstable and frequently collapse due to the accumulation of ammonia. We report the discovery of a novel uncultured phylotype (unFirm_1) that is highly detectable in metaproteomic data generated from an ammonia-tolerant commercial reactor. Importantly, unFirm_1 is proposed to perform a key metabolic step in biogas microbiomes, whereby it syntrophically oxidizes acetate to hydrogen and carbon dioxide, which methanogens then covert to methane. Only very few culturable syntrophic acetate-oxidizing bacteria have been described, and all were detected at low in situ levels compared to unFirm_1. Broader comparisons produced the hypothesis that unFirm_1 is a key mediator toward the successful long-term stable operation of biogas production using protein-rich substrates.
Current methods for genome-wide analysis of gene expression require fragmentation of original transcripts into small fragments for short-read sequencing. In bacteria, the resulting fragmented information hides operon complexity. Additionally, in vivo processing of transcripts confounds the accurate identification of the 5′ and 3′ ends of operons. Here we develop a methodology called SMRT-Cappable-seq that combines the isolation of un-fragmented primary transcripts with single-molecule long read sequencing. Applied to E. coli, this technology results in an accurate definition of the transcriptome with 34% of known operons from RegulonDB being extended by at least one gene. Furthermore, 40% of transcription termination sites have read-through that alters the gene content of the operons. As a result, most of the bacterial genes are present in multiple operon variants reminiscent of eukaryotic splicing. By providing such granularity in the operon structure, this study represents an important resource for the study of prokaryotic gene network and regulation.
Personal transcriptomes in which all of an individual’s genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes =3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV–in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.
Alternative polyadenylation (APA), a phenomenon that RNA molecules with different 3′ ends originate from distinct polyadenylation sites of a single gene, is emerging as a mechanism widely used to regulate gene expression. In the present review, we first summarized various methods prevalently adopted in APA study, mainly focused on the next-generation sequencing (NGS)-based techniques specially designed for APA identification, the related bioinformatics methods, and the strategies for APA study in single cells. Then we summarized the main findings and advances so far based on these methods, including the preferences of alternative polyA (pA) site, the biological processes involved, and the corresponding consequences. We especially categorized the APA changes discovered so far and discussed their potential functions under given conditions, along with the possible underlying molecular mechanisms. With more in-depth studies on extensive samples, more signatures and functions of APA will be revealed, and its diverse roles will gradually heave in sight. Copyright © 2017 The Authors. Production and hosting by Elsevier B.V. All rights reserved.
CLK-dependent exon recognition and conjoined gene formation revealed with a novel small molecule inhibitor.
CDC-like kinase phosphorylation of serine/arginine-rich proteins is central to RNA splicing reactions. Yet, the genomic network of CDC-like kinase-dependent RNA processing events remains poorly defined. Here, we explore the connectivity of genomic CDC-like kinase splicing functions by applying graduated, short-exposure, pharmacological CDC-like kinase inhibition using a novel small molecule (T3) with very high potency, selectivity, and cell-based stability. Using RNA-Seq, we define CDC-like kinase-responsive alternative splicing events, the large majority of which monotonically increase or decrease with increasing CDC-like kinase inhibition. We show that distinct RNA-binding motifs are associated with T3 response in skipped exons. Unexpectedly, we observe dose-dependent conjoined gene transcription, which is associated with motif enrichment in the last and second exons of upstream and downstream partners, respectively. siRNA knockdown of CLK2-associated genes significantly increases conjoined gene formation. Collectively, our results reveal an unexpected role for CDC-like kinase in conjoined gene formation, via regulation of 3′-end processing and associated splicing factors.The phosphorylation of serine/arginine-rich proteins by CDC-like kinase is a central regulatory mechanism for RNA splicing reactions. Here, the authors synthesize a novel small molecule CLK inhibitor and map CLK-responsive alternative splicing events and discover an effect on conjoined gene transcription.
The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.© 2017 You et al.; Published by Cold Spring Harbor Laboratory Press.
Differential expression analysis of olfactory genes based on a combination of sequencing platforms and behavioral investigations in Aphidius gifuensis.
Aphidius gifuensis Ashmead is a dominant endoparasitoid of aphids, such as Myzus persicae and Sitobion avenae, and plays an important role in controlling aphids in various habitats, including tobacco plants and wheat in China. A. gifuensis has been successfully applied for the biological control of aphids, especially M. persicae, in green houses and fields in China. The corresponding parasites, as well as its mate-searching behaviors, are subjects of considerable interest. Previous A. gifuensis transcriptome studies have relied on short-read next-generation sequencing (NGS), and the vast majority of the resulting isotigs do not represent full-length cDNA. Here, we employed a combination of NGS and single-molecule real-time (SMRT) sequencing of virgin females (VFs), mated females (MFs), virgin males (VMs), and mated males (MMs) to comprehensively study the A. gifuensis transcriptome. Behavioral responses to the aphid alarm pheromone (E-ß-farnesene, EBF) as well as to A. gifuensis of the opposite sex were also studied. VMs were found to be attracted by female wasps and MFs were repelled by male wasps, whereas MMs and VFs did not respond to the opposite sex. In addition, VFs, MFs, and MMs were attracted by EBF, while VMs did not respond. According to these results, we performed a personalized differential gene expression analysis of olfactory gene sets (66 odorant receptors, 25 inotropic receptors, 16 odorant-binding proteins, and 12 chemosensory proteins) in virgin and mated A. gifuensis of both sexes, and identified 13 candidate genes whose expression levels were highly consistent with behavioral test results, suggesting potential functions for these genes in pheromone perception.
Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.
RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.