PacBio HiFi sequencing has been demonstrated to provide the most accurate and complete characterization of human genomes, detecting single-nucleotide variants, indels, and structural variants with high precision and recall using a single sequencing experiment. Here we show that the same experiment also characterizes the epigenome, providing CpG methylation without additional sample prep or sequencing.
PacBio sequencing observes a polymerase in real time as it incorporates fluorescently labeled nucleotides to synthesize a DNA strand. The labels identify the nucleotide sequence. The kinetic signatures – pulse width (time of incorporation) and inter-pulse duration (time between adjacent incorporations) – correlate with chemical modifications to the canonical DNA bases. Some modifications, like N6 methyladenine, have strong kinetic signatures that allow accurate detection with single observations. Others, like the 5-methylcytosine (5mC) modification present in human genomes, have a lower signal-to-noise ratio, which have necessitated high sequence coverage, specialized sample prep, or averaging over genomic regions to achieve high accuracy. PacBio HiFi sequencing, which provides multiple serial observations of the same molecule, opens new methodological approaches. We implemented a multilayer convolution neural network that combines kinetic measurements from the multiple passes and assumes symmetric methylation of CpG sites, which is typical in mammals. We trained the model on fully unmethylated (whole-genome amplification) and fully methylated (M.SssI-treated) HiFi reads. In that context, the network distinguishes methylated from unmethylated CpG in single molecules with an accuracy of around 80% at typical HiFi reads lengths (15 kb).
We further applied the approach on 3 samples from the Genome in a Bottle (GIAB) reference materials, for which CpG methylation is available from other approaches, including bisulfite sequencing, nanopore sequencing, and methylation microarrays. The PacBio HiFi CpG methylation calls have high correlation with the orthogonal approaches. By simultaneously providing accurate SNV calling and phasing, the HiFi reads reveal haplotype-specific methylation including parental imprinting. This complete view of the genome and epigenome in a single experiment provides the opportunity to understand new aspects of evolution, disease, and diversity.