Menu
May 21, 2020  |  Sequencing 101

Sequencing 101: understanding accuracy in DNA sequencing

 

Last updated: 01 May 2026

For scientists who utilize DNA sequencing in their research but are not experts in the underlying technology, it can be difficult to determine the accuracy of sequencing results — and even harder to compare accuracy across sequencing platforms. Furthermore, accuracy differs not only between technologies but also across genomic regions as some stretches of the genome are inherently more difficult to read.

It is critically important to understand accuracy in DNA sequencing to distinguish important biological information from sequencing errors.

How to get HiFi reads

Looking for a quick refresher on how HiFi sequencing works and why it delivers such high accuracy? This short video walks through the basics, from circular consensus sequencing to the generation of highly accurate HiFi reads.

 

Watch this short video to see how HiFi sequencing works.

 

What are the types of sequencing accuracy?

HiFi reads are generated by combining multiple consecutive observations of a DNA molecule (subreads), driving the accuracy of individual HiFi reads over 99%.

There are two key types of accuracy in DNA sequencing technologies: read accuracy and consensus accuracy. Read accuracy is the inherent error rate of individual measurements (reads) from a DNA sequencing technology. Typical read accuracy ranges from ~90% for traditional long reads to >99% for short reads and HiFi reads.

Consensus accuracy, on the other hand, is determined by combining information from multiple reads in a data set, which eliminates any random errors in individual reads. Deeper coverage — meaning more reads from which to build a consensus — generally increases the accuracy of results. However, there are still limitations to calling consensus from multiple reads. Consensus calculation is a complicated and computationally expensive process, and it cannot overcome systematic errors. If a sequencing platform consistently makes the same mistake, then it will not be erased by generating more sequencing coverage.

To sidestep this problem, it is common to “polish” long reads that have systematic errors with high-accuracy short reads. However, because of their read length, short reads cannot always map to the long reads unambiguously, limiting their ability to improve accuracy. In general, consensus is improved — and vastly simplified — by starting with highly accurate reads like HiFi with no systematic biases.

 

HiFi sequencing of STRC gene
HiFi reads provide the accuracy needed to call single nucleotide variants, while improving mappability and enabling phasing with no systematic bias. STRC gene alignments from Genome in a Bottle (GIAB), HG002_NA24385_son. (IGV settings)

 

How does accuracy impact the utility of sequencing data?

Coverage uniformity
It is commonly known that certain genomic regions are more difficult for sequencers to get through than others. Centromeres and telomeres are notoriously tough because of the highly repetitive sequence they contain. Regions that are AT-rich or GC-rich are similarly difficult because they respond poorly to the amplification protocols required by some platforms. Palindromic sequences or hairpin structures are difficult to denature, making such regions challenging for sequencing tools that include a denaturation step.

HiFi reads generated with SMRT sequencing technology
HiFi reads are generated by combining multiple consecutive observations of a DNA molecule (subreads), driving the accuracy of individual HiFi reads over 99%.

 

Many scientists avoid these problems by opting for a single-molecule sequencing method that does not require amplification or denaturation, like HiFi sequencing technology. Because HiFi sequencing can process even difficult regions, performing uniformly regardless of sequence context, it generates accurate results even in regions that would flummox other platforms. Selecting a platform without systematic bias, like the Revio or Vega system, is important to producing the most accurate sequence data.

Mappability
The accuracy of a genome assembly goes beyond the accuracy of each individual base. Even perfect reads can contribute to poor accuracy if they are not ordered and oriented correctly in the assembly. This question of where to place the read is called mappability.

Reads containing only a piece of a large structural element, or consisting of highly repetitive sequences, can be very difficult to align, mapping ambiguously to many different locations in a reference. This is where short reads really struggle; because of their size, there is a greater chance that they will not contain enough unique sequence data to anchor them properly in a genome. Since HiFi reads stretch across many kilobases of DNA, they almost always contain unique flanking sequences that can be used to map them accurately in an assembly.

Repetitive elements — HiFi reads
HiFi reads span repetitive repeats increasing mappability.

Phasing
When exploring diploid or polyploid genomes, phasing means separating the different copies of each chromosome (e.g., maternal and paternal for diploid), known as haplotypes. With sufficient accuracy, the identity of nucleotides at each position in the genome can be compared with a reference sequence to identify SNVs, with a heterozygous locus indicating a difference in sequence between a homologous chromosome pair. This is where the inherent low accuracy of traditional error-prone long reads becomes a limitation — with a high error rate, it makes it impossible to decide whether a disagreement between a reference and data set is a variant or a sequence error.

Phasing
Phasing involves separating maternally and paternally inherited copies of each chromosome.

 

Another approach to obtain phase information is to also sequence the parents of the individual whose genome you need phased. However, in many wild species where the parents aren’t available, a highly accurate long-read sequencing approach, like HiFi sequencing, would be simpler. There are also computational methods or the use of population haplotype frequency information to infer phasing.

Overall, phased genomes or variant calls are higher quality than haplotype-collapsed versions as they provide allelic information, which can be important for studying human diseases, crop improvement, evolution, and more. HiFi reads, with accuracy high enough to detect SNVs and read lengths to detect these SNVs over many kilobases, generate larger phased haplotype blocks.

As scientists analyze more and more genomic data, the role of sequence accuracy will likely only become more important. HiFi reads offer the benefits of high accuracy equivalent to short-read sequencing data, but with the length necessary for complex genome assemblies and phasing of variants across large swaths of the genome.

 

How SPRQ-Nx chemistry improves HiFi sequencing accuracy

Accuracy in sequencing isn’t static. It continues to improve with advances in chemistry and computational methods. The latest example is SPRQ-Nx chemistry, which builds on the strengths of HiFi sequencing to deliver improved performance and significantly reduce cost per genome through SMRT Cell reuse.

One of the most notable advances that comes alongside the addition of 5hmC methylation is in the improved performance of 5mC DNA methylation detection. With SPRQ-Nx, researchers can achieve substantially higher concordance with whole genome bisulfite sequencing when measuring aggregate methylation and improved single-molecule accuracy in controlled datasets. Updated models increase 5mC accuracy in synthetic test molecules from 82.3% to 88.9% and improve correlation to WGBS from 0.842 to 0.911 at 60X coverage, enabling more reliable interpretation of epigenetic patterns. By improving signal resolution, SPRQ-Nx helps distinguish true biological variation from noise, an essential step for accurate downstream analysis.

At the same time, SPRQ-Nx brings advances in on-board analysis that further enhances read accuracy. Tools like DeepConsensus, a transformer-based model developed with Google, now incorporate improvements enabled by Google’s AlphaEvolve agent to refine consensus sequences beyond what traditional methods can achieve. These updates improve how the model handles insertion and deletion errors through optimized alignment strategies, increasing the proportion of reads reaching Q30 accuracy from 47.9% to 53.2% and improving quality calibration near key thresholds. This leads to higher-quality reads, particularly in challenging genomic regions.

These gains in accuracy translate directly into real-world impact and reinforce why sequencing accuracy matters. Higher-confidence reads enable more reliable discoveries in areas like variant detection in clinical research, resolving complex variants in rare disease, and supporting more precise insights in biopharma development. As sequencing becomes more accurate, researchers across many fields can generate clearer, more trustworthy insights from their data and move more confidently from sequence to discovery.

 


Want to learn more about HiFi sequencing?

Sign up for the latest updates on SPRQ-chemistry or connect with a PacBio scientist directly

Explore other posts in the Sequencing 101 series

Watch the updated HiFi how it works video

Understand the HiFi difference and debunk common sequencing myths

See customer success stories for yourself

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.