March 18, 2020  |  Plant + animal biology

Prokaryotic Methylation Detection on the Sequel II System

Since the first PacBio instrument was released in 2011, methylation detection has been one of the advantages of SMRT Sequencing. The kinetics of nucleotide incorporation change as the DNA polymerase moves across a methylated position on the DNA template strand, producing distinctive perturbation patterns (Figure 1) that can be recognized by methylation-calling software.

Figure 1: The arrows indicate the methylated positions on a 199 bp circular template. Bars indicate the ratio of the average intra-pulse distance (IPD) on the methylated template to that of the control template. Each methylation type produces a unique fingerprint.

With the advent of a simple method for detecting methylation in prokaryotes, researchers have demonstrated that in addition to functioning as a defense against phages, bacterial R-M systems can also drive important traits like antibiotic resistance, immune evasion, virulence and persistence in hosts.
Recent internal validation work has confirmed that detection of m6A and m4C in prokaryotic DNA and the R-M system target motifs they reside in continues to perform robustly on the Sequel II System. The detection of 5mC continues to require significantly higher coverage and is therefore not supported through the SMRT Analysis ‘Base Modification Analysis’ workflow.
Figure 2. Detection of methylation in E. coli K. ‘Type’ compares the IPD fingerprint of the reported motif to empirical models of m4C and m6A sequencing perturbations. ‘% Detected’ reports what fraction of motifs present in the assembly are above the specified Modification QV threshold. ‘Mean QV’ is a measure of confidence that the flagged base within the reported motif is methylated.

Our initial validation was done on E. coli K, sequenced as part of a 48-plex sequencing run on the Sequel II System (Figure 2). All three known m6A motifs were successfully detected.  In addition, the high coverage weakly detected the known target of the Dcm m5C methylase, CCWGG.  However, since m5C calling is not supported, it was erroneously tagged as m6A.
An important takeaway is that to obtain the cleanest motif-finding result, the ‘Minimum Qmod Score’, available as an advanced parameter in the ‘Base Modification Analysis’ application in SMRT Analysis, had to be increased manually. As shown by the red arrow in Figure 2, this value should be set such that it excludes most baseline noise while fully including the cloud of methylation signal. In this example, the ideal setting is Qmod = 200. While the optimal value of Qmod changes with sequencing coverage, we have found a value of 100 produces a good result in most cases when sequencing 48 microbes per SMRT Cell 8M.
To better assess performance across the full range of methylation patterns seen in microbes, we then analyzed data from 4 more challenging microbes. These more difficult examples confirm that the Sequel II System can detect both m6A and m4C at the same level of performance seen with our previous sequencing systems. The known R-M systems in Neisseria meningitidis FAM18 (Table 1), Treponima denticola A (Table 2), and Methanocorpusculum labreanum Z (Table 3) were largely recovered at high confidence. The few exceptions are likely due to competition between multiple methyltransferases that target overlapping motifs.
The most difficult test case was H. pylori J99, which carries 24 distinct R-M systems, targeting m6A, m4C, and m5C. We called 21/24 motifs precisely correctly. In one instance our motif caller was confounded by overlapping motifs, but the correct answer could be easily discerned by visual examination. The remaining two missed motifs involve m5C, which continues to be unsupported.
Table 1. m6A motifs of N meningitidis. N. Meningitidis also has six m5C motifs (CCTTC, GCGCGC, TCTGG, CCAGA, CCGG, RCCGGY) which were not detected. The low % detected for ACACC is likely the result of competition between methyltransferases for overlapping m5C sites (CCGG, RCCGGY).

Table 2. The motifs of all 9 R-M systems active in T. denticola were detected without error.

Table 3. R-M system recognition motifs of M. labreanum. The low percent detected for ACCNNNNNNRTGA / TCAYNNNNNNGGT is most likely due to competition between m6A modification and m4C modification of the overlapping GTAC motif.

Table 4. H. pylori J99 contains 24 active methyltransferases. The two motifs marked with an asterisk are split because our pattern-finding software was confounded by the partially overlapping CATG motif. GWCAYH (H = ‘not G’) + GWCACG (the missing G!) = GWCAYN (correct call; Y = A/T) – CATG (distinct R-M system target, called correctly).

We hope these results will give all our customers who study prokaryotic methylation the confidence to move forward with planning bacterial whole genome sequencing experiments on the Sequel II System, taking full advantage of the higher multiplexing capacity and reduced per sample cost.
Learn more bacterial whole genome sequencing and prokaryotic epigenetics on the Sequel II System.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.