N6-methyladenosine (m6A) is a widespread RNA modification that influences nearly every aspect of the messenger RNA lifecycle. Our understanding of m6A has been facilitated by the development of global m6A mapping methods, which use antibodies to immunoprecipitate methylated RNA. However, these methods have several limitations, including high input RNA requirements and cross-reactivity to other RNA modifications. Here, we present DART-seq (deamination adjacent to RNA modification targets), an antibody-free method for detecting m6A sites. In DART-seq, the cytidine deaminase APOBEC1 is fused to the m6A-binding YTH domain. APOBEC1-YTH expression in cells induces C-to-U deamination at sites adjacent to m6A residues, which are detected using standard RNA-seq. DART-seq identifies thousands of m6A sites in cells from as little as 10?ng of total RNA and can detect m6A accumulation in cells over time. Additionally, we use long-read DART-seq to gain insights into m6A distribution along the length of individual transcripts.
A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes.
Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94?Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5?kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant-pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.
Complete genome sequence of Pseudomonas frederiksbergensis ERDD5:01 revealed genetic bases for survivability at high altitude ecosystem and bioprospection potential.
Pseudomonas frederiksbergensis ERDD5:01 is a psychrotrophic bacteria isolated from the glacial stream flowing from East Rathong glacier in Sikkim Himalaya. The strain showed survivability at high altitude stress conditions like freezing, frequent freeze-thaw cycles, and UV-C radiations. The complete genome of 5,746,824?bp circular chromosome and a plasmid of 371,027?bp was sequenced to understand the genetic basis of its survival strategy. Multiple copies of cold-associated genes encoding cold active chaperons, general stress response, osmotic stress, oxidative stress, membrane/cell wall alteration, carbon storage/starvation and, DNA repair mechanisms supported its survivability at extreme cold and radiations corroborating with the bacterial physiological findings. The molecular cold adaptation analysis in comparison with the genome of 15 mesophilic Pseudomonas species revealed functional insight into the strategies of cold adaptation. The genomic data also revealed the presence of industrially important enzymes.Copyright © 2018 Elsevier Inc. All rights reserved.
Comparison of the mitochondrial genomes and steady state transcriptomes of two strains of the trypanosomatid parasite, Leishmania tarentolae.
U-insertion/deletion RNA editing is a post-transcriptional mitochondrial RNA modification phenomenon required for viability of trypanosomatid parasites. Small guide RNAs encoded mainly by the thousands of catenated minicircles contain the information for this editing. We analyzed by NGS technology the mitochondrial genomes and transcriptomes of two strains, the old lab UC strain and the recently isolated LEM125 strain. PacBio sequencing provided complete minicircle sequences which avoided the assembly problem of short reads caused by the conserved regions. Minicircles were identified by a characteristic size, the presence of three short conserved sequences, a region of inherently bent DNA and the presence of single gRNA genes at a fairly defined location. The LEM125 strain contained over 114 minicircles encoding different gRNAs and the UC strain only ~24 minicircles. Some LEM125 minicircles contained no identifiable gRNAs. Approximate copy numbers of the different minicircle classes in the network were determined by the number of PacBio CCS reads that assembled to each class. Mitochondrial RNA libraries from both strains were mapped against the minicircle and maxicircle sequences. Small RNA reads mapped to the putative gRNA genes but also to multiple regions outside the genes on both strands and large RNA reads mapped in many cases over almost the entire minicircle on both strands. These data suggest that minicircle transcription is complete and bidirectional, with 3′ processing yielding the mature gRNAs. Steady state RNAs in varying abundances are derived from all maxicircle genes, including portions of the repetitive divergent region. The relative extents of editing in both strains correlated with the presence of a cascade of cognate gRNAs. These data should provide the foundation for a deeper understanding of this dynamic genetic system as well as the evolutionary variation of editing in different strains.
Transcriptomic studies have demonstrated that the vast majority of the genomes of mammals and other complex organisms is expressed in highly dynamic and cell-specific patterns to produce large numbers of intergenic, antisense and intronic long non-protein-coding RNAs (lncRNAs). Despite well characterized examples, their scaling with developmental complexity, and many demonstrations of their association with cellular processes, development and diseases, lncRNAs are still to be widely accepted as major players in gene regulation. This may reflect an underappreciation of the extent and precision of the epigenetic control of differentiation and development, where lncRNAs appear to have a central role, likely as organizational and guide molecules: most lncRNAs are nuclear-localized and chromatin-associated, with some involved in the formation of specialized subcellular domains. I suggest that a reassessment of the conceptual framework of genetic information and gene expression in the 4-dimensional ontogeny of spatially organized multicellular organisms is required. Together with this and further studies on their biology, the key challenges now are to determine the structure?function relationships of lncRNAs, which may be aided by emerging evidence of their modular structure, the role of RNA editing and modification in enabling epigenetic plasticity, and the role of RNA signaling in transgenerational inheritance of experience.
Analysis of RNA base modification and structural rearrangement by single-molecule real-time detection of reverse transcription.
Zero-mode waveguides (ZMWs) are photonic nanostructures that create highly confined optical observation volumes, thereby allowing single-molecule-resolved biophysical studies at relatively high concentrations of fluorescent molecules. This principle has been successfully applied in single-molecule, real-time (SMRT®) DNA sequencing for the detection of DNA sequences and DNA base modifications. In contrast, RNA sequencing methods cannot provide sequence and RNA base modifications concurrently as they rely on complementary DNA (cDNA) synthesis by reverse transcription followed by sequencing of cDNA. Thus, information on RNA modifications is lost during the process of cDNA synthesis.Here we describe an application of SMRT technology to follow the activity of reverse transcriptase enzymes synthesizing cDNA on thousands of single RNA templates simultaneously in real time with single nucleotide turnover resolution using arrays of ZMWs. This method thereby obtains information from the RNA template directly. The analysis of the kinetics of the reverse transcriptase can be used to identify RNA base modifications, shown by example for N6-methyladenine (m6A) in oligonucleotides and in a specific mRNA extracted from total cellular mRNA. Furthermore, the real-time reverse transcriptase dynamics informs about RNA secondary structure and its rearrangements, as demonstrated on a ribosomal RNA and an mRNA template.Our results highlight the feasibility of studying RNA modifications and RNA structural rearrangements in ZMWs in real time. In addition, they suggest that technology can be developed for direct RNA sequencing provided that the reverse transcriptase is optimized to resolve homonucleotide stretches in RNA.
Long-read sequencing (LRS) techniques are very recent advancements, but they have already been used for transcriptome research in all of the three subfamilies of herpesviruses. These techniques have multiplied the number of known transcripts in each of the examined viruses. Meanwhile, they have revealed a so far hidden complexity of the herpesvirus transcriptome with the discovery of a large number of novel RNA molecules, including coding and non-coding RNAs, as well as transcript isoforms, and polycistronic RNAs. Additionally, LRS techniques have uncovered an intricate meshwork of transcriptional overlaps between adjacent and distally located genes. Here, we review the contribution of LRS to herpesvirus transcriptomics and present the complexity revealed by this technology, while also discussing the functional significance of this phenomenon.
The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro-insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and diseases of the human brain.
Ribonucleic acid (RNA) is capable of hosting a variety of chemically diverse modifications, in both naturally-occurring post-transcriptional modifications and artificial chemical modifications used to expand the functionality of RNA. However, few studies have addressed how base modifications affect RNA polymerase and reverse transcriptase activity and fidelity. Here, we describe the fidelity of RNA synthesis and reverse transcription of modified ribonucleotides using an assay based on Pacific Biosciences Single Molecule Real-Time sequencing. Several modified bases, including methylated (m6A, m5C and m5U), hydroxymethylated (hm5U) and isomeric bases (pseudouridine), were examined. By comparing each modified base to the equivalent unmodified RNA base, we can determine how the modification affected cumulative RNA polymerase and reverse transcriptase fidelity. 5-hydroxymethyluridine and N6-methyladenosine both increased the combined error rate of T7 RNA polymerase and reverse transcriptases, while pseudouridine specifically increased the error rate of RNA synthesis by T7 RNA polymerase. In addition, we examined the frequency, mutational spectrum and sequence context of reverse transcription errors on DNA templates from an analysis of second strand DNA synthesis.
As researchers open up to the reality of RNA modification, an expanded epitranscriptomics toolbox takes shape.
DNA methylation on N6-adenine (6mA) has recently been found to be a potentially epigenetic mark in several unicellular and multicellular eukaryotes. However, its distribution patterns and potential functions in land plants, which are primary producers for most ecosystems, remain largely unknown. Here we report global profiling of 6mA sites at single-nucleotide resolution in the genome of Arabidopsis thaliana at different developmental stages using single-molecule real-time sequencing. 6mA sites are widely distributed across the Arabidopsis genome and enriched over the pericentromeric heterochromatin regions. 6mA occurs more frequently in gene bodies than intergenic regions. Analysis of 6mA methylomes and RNA sequencing data demonstrates that 6mA frequency positively correlates with the gene expression level and the transition from vegetative to reproductive growth in Arabidopsis. Our results uncover 6mA as a DNA mark associated with actively expressed genes in Arabidopsis, suggesting that 6mA serves as a hitherto unknown epigenetic mark in land plants. Copyright © 2018 Elsevier Inc. All rights reserved.
Genomic characterization reveals significant divergence within Chlorella sorokiniana (Chlorellales, Trebouxiophyceae)
Selection of highly productive algal strains is crucial for establishing economically viable biomass and biopro- duct cultivation systems. Characterization of algal genomes, including understanding strain-specific differences in genome content and architecture is a critical step in this process. Using genomic analyses, we demonstrate significant differences between three strains of Chlorella sorokiniana (strain 1228, UTEX 1230, and DOE1412). We found that unique, strain-specific genes comprise a substantial proportion of each genome, and genomic regions with> 80% local nucleotide identity constitute <15% of each genome among the strains, indicating substantial strain specific evolution. Furthermore, cataloging of meiosis and other sex-related genes in C. sor- okiniana strains suggests strategic breeding could be utilized to improve biomass and bioproduct yields if a sexual cycle can be characterized. Finally, preliminary investigation of epigenetic machinery suggests the pre- sence of potentially unique transcriptional regulation in each strain. Our data demonstrate that these three C. sorokiniana strains represent significantly different genomic content. Based on these findings, we propose in- dividualized assessment of each strain for potential performance in cultivation systems.
Reconstitution of eukaryotic chromosomes and manipulation of DNA N6-methyladenine alters chromatin and gene expression
DNA N6-adenine methylation (6mA) has recently been reported in diverse eukaryotes, spanning unicellular organisms to metazoans. Yet the functional significance of 6mA remains elusive due to its low abundance, difficulty of manipulation within native DNA, and lack of understanding of eukaryotic 6mA writers. Here, we report a novel DNA 6mA methyltransferase in ciliates, termed MTA1. The enzyme contains an MT-A70 domain but is phylogenetically distinct from all known RNA and DNA methyltransferases. Disruption of MTA1 in vivo leads to the genome-wide loss of 6mA in asexually growing cells and abolishment of the consensus ApT dimethylated motif. Genes exhibit subtle changes in chromatin organization or RNA expression upon loss of 6mA, depending on their starting methylation level. Mutants fail to complete the sexual cycle, which normally coincides with a peak of MTA1 expression. Thus, MTA1 functions in a developmental stage-specific manner. We determine the impact of 6mA on chromatin organization in vitro by reconstructing complete, full-length ciliate chromosomes harboring 6mA in native or ectopic positions. Using these synthetic chromosomes, we show that 6mA directly disfavors nucleosomes in vitro in a local, quantitative manner, independent of DNA sequence. Furthermore, the chromatin remodeler ACF can overcome this effect. Our study identifies a novel MT-A70 protein necessary for eukaryotic 6mA methylation and defines the impact of 6mA on chromatin organization using epigenetically defined synthetic chromosomes.
Although several resurrection plant genomes have been sequenced, the lack of suitable dehydration-sensitive outgroups has limited genomic insights into the origin of desiccation tolerance. Here, we utilized a comparative system of closely related desiccation-tolerant (Lindernia brevidens) and -sensitive (Lindernia subracemosa) species to identify gene- and pathway-level changes associated with the evolution of desiccation tolerance. The two high-quality Lindernia genomes we assembled are largely collinear, and over 90% of genes are conserved. L. brevidens and L. subracemosa have evidence of an ancient, shared whole-genome duplication event, and retained genes have neofunctionalized, with desiccation-specific expression in L. brevidens Tandem gene duplicates also are enriched in desiccation-associated functions, including a dramatic expansion of early light-induced proteins from 4 to 26 copies in L. brevidens A comparative differential gene coexpression analysis between L. brevidens and L. subracemosa supports extensive network rewiring across early dehydration, desiccation, and rehydration time courses. Many LATE EMBRYOGENESIS ABUNDANT genes show significantly higher expression in L. brevidens compared with their orthologs in L. subracemosa Coexpression modules uniquely upregulated during desiccation in L. brevidens are enriched with seed-specific and abscisic acid-associated cis-regulatory elements. These modules contain a wide array of seed-associated genes that have no expression in the desiccation-sensitive L. subracemosa Together, these findings suggest that desiccation tolerance evolved through a combination of gene duplications and network-level rewiring of existing seed desiccation pathways.© 2018 American Society of Plant Biologists. All rights reserved.