Prokaryotic DNA contains three types of methylation: N6-methyladenine, N4-methylcytosine and 5-methylcytosine. The lack of tools to analyse the frequency and distribution of methylated residues in bacterial genomes has prevented a full understanding of their functions. Now, advances in DNA sequencing technology, including single-molecule, real-time sequencing and nanopore-based sequencing, have provided new opportunities for systematic detection of all three forms of methylated DNA at a genome-wide scale and offer unprecedented opportunities for achieving a more complete understanding of bacterial epigenomes. Indeed, as the number of mapped bacterial methylomes approaches 2,000, increasing evidence supports roles for methylation in regulation of gene expression, virulence and pathogen-host interactions.
Genome-wide analysis of DNA methylation patterns using single molecule real-time DNA sequencing has boosted the number of publicly available methylomes. However, there is a lack of tools coupling methylation patterns and the corresponding methyltransferase genes. Here we demonstrate a high-throughput method for coupling methyltransferases with their respective motifs, using automated cloning and analysing the methyltransferases in vectors carrying a strain-specific cassette containing all potential target sites. To validate the method, we analyse the genomes of the thermophile Moorella thermoacetica and the mesophile Acetobacterium woodii, two acetogenic bacteria having substantially modified genomes with 12 methylation motifs and a total of 23 methyltransferase genes. Using our method, we characterize the 23 methyltransferases, assign motifs to the respective enzymes and verify activity for 11 of the 12 motifs.
The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.
Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community.
DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the ‘metaepigenomes’ of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We identified methyltransferase genes likely responsible for methylation of the novel motifs, and confirmed the catalytic specificities of four of them via transformation experiments using synthetic genes. Our study highlights metaepigenomics as a powerful approach for identification of the vast unexplored variety of prokaryotic DNA methylation systems in nature.
Many bacterial genomes exclusively display an N4-methyl cytosine base (m4C), whose physiological significance is not yet clear. Helicobacter pylori is a carcinogenic bacterium and the leading cause of gastric cancer in humans. Helicobacter pylori strain 26695 harbors a single m4C cytosine methyltransferase, M2.HpyAII which recognizes 5′ TCTTC 3′ sequence and methylates the first cytosine residue. To understand the role of m4C modification, M2.hpyAII deletion strain was constructed. Deletion strain displayed lower adherence to host AGS cells and reduced potential to induce inflammation and apoptosis. M2.hpyAII gene deletion strain exhibited reduced capacity for natural transformation, which was rescued in the complemented strain carrying an active copy of M2.hpyAII gene in the genome. Genome-wide gene expression and proteomic analysis were carried out to discern the possible reasons behind the altered phenotype of the M2.hpyAII gene deletion strain. Upon the loss of m4C modification a total of 102 genes belonging to virulence, ribosome assembly and cellular components were differentially expressed. The present study adds a functional role for the presence of m4C modification in H. pylori and provides the first evidence that m4C signal acts as a global epigenetic regulator in H. pylori.
The chemical diversity of physiological DNA modifications has expanded with the identification of phosphorothioate (PT) modification in which the nonbridging oxygen in the sugar-phosphate backbone of DNA is replaced by sulfur. Together with DndFGH as cognate restriction enzymes, DNA PT modification, which is catalyzed by the DndABCDE proteins, functions as a bacterial restriction-modification (R-M) system that protects cells against invading foreign DNA. However, the occurrence of dnd systems across a large number of bacterial genomes and their functions other than R-M are poorly understood. Here, a genomic survey revealed the prevalence of bacterial dnd systems: 1,349 bacterial dnd systems were observed to occur sporadically across diverse phylogenetic groups, and nearly half of these occur in the form of a solitary dndBCDE gene cluster that lacks the dndFGH restriction counterparts. A phylogenetic analysis of 734 complete PT R-M pairs revealed the coevolution of M and R components, despite the observation that several PT R-M pairs appeared to be assembled from M and R parts acquired from distantly related organisms. Concurrent epigenomic analysis, transcriptome analysis, and metabolome characterization showed that a solitary PT modification contributed to the overall cellular redox state, the loss of which perturbed the cellular redox balance and induced Pseudomonas fluorescens to reconfigure its metabolism to fend off oxidative stress. An in vitro transcriptional assay revealed altered transcriptional efficiency in the presence of PT DNA modification, implicating its function in epigenetic regulation. These data suggest the versatility of PT in addition to its involvement in R-M protection.
A survey of Type III restriction-modification systems reveals numerous, novel epigenetic regulators controlling phase-variable regulons; phasevarions.
Many bacteria utilize simple DNA sequence repeats as a mechanism to randomly switch genes on and off. This process is called phase variation. Several phase-variable N6-adenine DNA-methyltransferases from Type III restriction-modification systems have been reported in bacterial pathogens. Random switching of DNA methyltransferases changes the global DNA methylation pattern, leading to changes in gene expression. These epigenetic regulatory systems are called phasevarions – phase-variable regulons. The extent of these phase-variable genes in the bacterial kingdom is unknown. Here, we interrogated a database of restriction-modification systems, REBASE, by searching for all simple DNA sequence repeats in mod genes that encode Type III N6-adenine DNA-methyltransferases. We report that 17.4% of Type III mod genes (662/3805) contain simple sequence repeats. Of these, only one-fifth have been previously identified. The newly discovered examples are widely distributed and include many examples in opportunistic pathogens as well as in environmental species. In many cases, multiple phasevarions exist in one genome, with examples of up to 4 independent phasevarions in some species. We found several new types of phase-variable mod genes, including the first example of a phase-variable methyltransferase in pathogenic Escherichia coli. Phasevarions are a common epigenetic regulation contingency strategy used by both pathogenic and non-pathogenic bacteria.
Comparative genomic and methylome analysis of non-virulent D74 and virulent Nagasaki Haemophilus parasuis isolates.
Haemophilus parasuis is a respiratory pathogen of swine and the etiological agent of Glässer’s disease. H. parasuis isolates can exhibit different virulence capabilities ranging from lethal systemic disease to subclinical carriage. To identify genomic differences between phenotypically distinct strains, we obtained the closed whole-genome sequence annotation and genome-wide methylation patterns for the highly virulent Nagasaki strain and for the non-virulent D74 strain. Evaluation of the virulence-associated genes contained within the genomes of D74 and Nagasaki led to the discovery of a large number of toxin-antitoxin (TA) systems within both genomes. Five predicted hemolysins were identified as unique to Nagasaki and seven putative contact-dependent growth inhibition toxin proteins were identified only in strain D74. Assessment of all potential vtaA genes revealed thirteen present in the Nagasaki genome and three in the D74 genome. Subsequent evaluation of the predicted protein structure revealed that none of the D74 VtaA proteins contain a collagen triple helix repeat domain. Additionally, the predicted protein sequence for two D74 VtaA proteins is substantially longer than any predicted Nagasaki VtaA proteins. Fifteen methylation sequence motifs were identified in D74 and fourteen methylation sequence motifs were identified in Nagasaki using SMRT sequencing analysis. Only one of the methylation sequence motifs was observed in both strains indicative of the diversity between D74 and Nagasaki. Subsequent analysis also revealed diversity in the restriction-modification systems harbored by D74 and Nagasaki. The collective information reported in this study will aid in the development of vaccines and intervention strategies to decrease the prevalence and disease burden caused by H. parasuis.
Streptococcus suis contains multiple phase-variable methyltransferases that show a discrete lineage distribution.
Streptococcus suis is a major pathogen of swine, responsible for a number of chronic and acute infections, and is also emerging as a major zoonotic pathogen, particularly in South-East Asia. Our study of a diverse population of S. suis shows that this organism contains both Type I and Type III phase-variable methyltransferases. In all previous examples, phase-variation of methyltransferases results in genome wide methylation differences, and results in differential regulation of multiple genes, a system known as the phasevarion (phase-variable regulon). We hypothesized that each variant in the Type I and Type III systems encoded a methyltransferase with a unique specificity, and could therefore control a distinct phasevarion, either by recombination-driven shuffling between different specificities (Type I) or by biphasic on-off switching via simple sequence repeats (Type III). Here, we present the identification of the target specificities for each Type III allelic variant from S. suis using single-molecule, real-time methylome analysis. We demonstrate phase-variation is occurring in both Type I and Type III methyltransferases, and show a distinct association between methyltransferase type and presence, and population clades. In addition, we show that the phase-variable Type I methyltransferase was likely acquired at the origin of a highly virulent zoonotic sub-population.
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.