1Pacific Biosciences, Menlo Park, United States
Both the genome and epigenome contribute to inherited disease. While genome sequencing has been applied at large scale, epigenome sequencing remains more difficult and expensive and less frequently used. Here we extend PacBio HiFi sequencing to simultaneously generate accurate genomes and epigenomes with a single library prep and sequencing experiment.
In PacBio sequencing, the nucleotide incorporation rate is sensitive to epigenetic modifications like 5- methylcytosine, but the signal is spread over multiple positions and is challenging to detect in single sequencing passes. HiFi sequencing observes the same molecule across multiple serial passes, opening new approaches to detect 5mC. We implemented a multilayer convolution neural network to combine kinetics from multiple passes and assign a probability of methylation to each CpG. We trained the model on fully unmethylated (whole-genome amplification) and fully methylated (M.SssI- treated) reads.
HiFi methylation calling accuracy for individual CpG sites in single reads (i.e. 1X coverage) is around 85%. At 30X coverage for Genome in a Bottle samples, HiFi CpG methylation correlates >95% with bisulfite sequencing, or >99% at the level of CpG islands. HiFi methylation calling also recapitulates biologically-relevant hypomethylation in the undifferentiated haploid CHM13 cell line, including unique patterns across chromosomes. In diploid samples – including rare disease samples – HiFi reads can be phased by sequence to reveal parental imprinting for genes like GNAS and DIRAS3.
HiFi sequencing provides an approach to generate more accurate genomes than short-read sequencing while simultaneously providing the epigenome, offering the opportunity to understand new aspects of evolution, disease, and diversity.