September 30, 2022

The HiFi difference – not being CLR

This week Illumina gave an update on their synthetic long-read approach previously named “Infinity reads”, now rebranded as Complete Long Reads (CLR). This acronym choice is of course quite ironic since CLR – Continuous Long Reads – was the first PacBio data type of error-prone single-pass long reads, which has since been replaced about three years ago by the highly accurate long HiFi sequencing reads.

Illumina’s decision to adopt the CLR designation appears very apt, as numerous sequencing and phasing errors that have been noted before are still present in the context of this latest renaming. Illumina’s new CLR webpage shows the very same IGV screenshots – we previously commented on these in posts entitled “True long reads vs. synthetic long reads” and “Getting the right answer”, highlighting that Illumina Infinity/CLR reads contain numerous errors, and thereby will give researchers incorrect answers. This indicates that after ~9 months since the original announcement, no progress has been made for the genes aimed to showcase the technology. Presumably, other regions of the genome have similar issues or are worse, however, no whole-genome datasets have been made available to the scientific community to allow an independent evaluation and comparison to the gold standard – PacBio HiFi sequencing.

As a side note, the variant calling comparison shown in Figure 1 is misleading, and the statements thereto are incorrect, as it refers to outdated information (as evidenced by footnote 2, pointing to 2020 numbers). With current variant callers, the latest F1 variant calling performance for PacBio HiFi sequencing is in fact 99.89%, i.e. higher than what is claimed for Illumina Infinity/CLR reads. So a more accurate Figure 1 would look like this:


CLR vs. PacBio HiFi


In summary, despite efforts of renaming things and using inappropriate comparisons, the fact remains that there is no change – true, accurate and long HiFi reads are unparalleled with giving researchers the most comprehensive, accurate, phased variant calling information (and simultaneous 5mC methylation information, not possible with Infinity/CLR due to amplification), while attempts with synthetic long reads fall far short of PacBio’s HiFi performance.

