X

Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.

X

Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.

IMAGES ARE PROVIDED BY Pacific Biosciences ON AN "AS-IS" BASIS. Pacific Biosciences DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, OWNERSHIP, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL Pacific Biosciences BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES OF ANY KIND WHATSOEVER WITH RESPECT TO THE IMAGES.

You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences' rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences
Contact:

Sequencing 101: Understanding Accuracy in DNA Sequencing

Thursday, May 21, 2020

For scientists who utilize DNA sequencing in their research but are not experts in the underlying technology, it can be difficult to determine the accuracy of sequencing results — and even harder to compare accuracy across sequencing platforms. Furthermore, accuracy differs not only between technologies but also across genomic regions as some stretches of the genome are inherently more difficult to read.

It is critically important to understand accuracy in DNA sequencing to distinguish important biological information from sequencing errors.

 

What are the Types of Sequencing Accuracy?

HiFi reads are generated by combining multiple consecutive observations of a DNA molecule (subreads), driving the accuracy of individual HiFi reads over 99%.

There are two key types of accuracy in DNA sequencing technologies: read accuracy and consensus accuracy. Read accuracy is the inherent error rate of individual measurements (reads) from a DNA sequencing technology. Typical read accuracy ranges from ~90% for traditional long reads to >99% for short reads and HiFi reads.

Consensus accuracy, on the other hand, is determined by combining information from multiple reads in a data set, which eliminates any random errors in individual reads. Deeper coverage, meaning more reads from which to build a consensus, generally increases the accuracy of results. However, there are still limitations to calling consensus from multiple reads. Consensus calculation is a complicated and computationally expensive process, and it cannot overcome systematic errors. If a sequencing platform consistently makes the same mistake, then it will not be erased by generating more sequencing coverage.

To sidestep this problem, it is common to “polish” long reads that have systematic errors with high accuracy short reads. However, because of their read length, short reads cannot always map to the long reads unambiguously, limiting their ability to improve accuracy. In general, consensus is improved – and vastly simplified – by starting with highly accurate reads with no systematic biases.

 

HiFi reads provide the accuracy needed to call single nucleotide variants, while improving mappability and enabling phasing with no systematic bias. STRC gene alignments from Genome in a Bottle (GIAB), HG002_NA24385_son.

 

How Does Accuracy Impact the Utility of Sequencing Data?

Coverage Uniformity

It is commonly known that certain genomic regions are more difficult for sequencers to get through than others. Centromeres and telomeres are notoriously tough because of the highly repetitive sequence they contain. Regions that are AT-rich or GC-rich are similarly difficult because they respond poorly to the amplification protocols required by some platforms. Palindromic sequences or hairpin structures are difficult to denature, making such regions challenging for sequencing tools that include a denaturation step.

HiFi reads generated with SMRT Sequencing technology provide uniform coverage regardless of GC content.

Many scientists avoid these problems by opting for a single-molecule sequencing method that does not require amplification or denaturation, such as PacBio’s SMRT Sequencing technology. Because SMRT Sequencing can process even difficult regions, performing uniformly regardless of sequence context, it generates accurate results even in regions that would flummox other platforms. Selecting a platform without systematic bias, like the Sequel II System, is important to producing the most accurate sequence data.

 

Mappability

The accuracy of a genome assembly goes beyond the accuracy of each individual base. Even perfect reads can contribute to poor accuracy if they are not ordered and oriented correctly in the assembly. This question of where to place the read is called mappability.

Reads containing only a piece of a large structural element, or consisting of highly repetitive sequences, can be very difficult to align, mapping ambiguously to many different locations in a reference. This is where short reads really struggle; because of their size, there is a greater chance that they will not contain enough unique sequence data to anchor them properly in a genome. Since HiFi reads stretch across many kilobases of DNA, they almost always contain unique flanking sequences that can be used to map them accurately in an assembly.

HiFi reads span repetitive repeats increasing mappability.

 

Phasing

When exploring diploid or polyploid genomes, phasing means separating the different copies of each chromosome (e.g. maternal and paternal for diploid), known as haplotypes. With sufficient accuracy, the identity of nucleotides at each position in the genome can be compared with a reference sequence to identify SNVs, with a heterozygous locus indicating a difference in sequence between a homologous chromosome pair. This is where the inherent low accuracy of traditional error prone long reads becomes a limitation – with a high error rate, it makes it impossible to decide whether a disagreement between a reference and data set is a variant or a sequence error.

 

Phasing involves separating maternally and paternally inherited copies of each chromosome.

 

Another approach to obtain phase information is to also sequence the parents of the individual whose genome you need phased. However, in many wild species where the parents aren’t available, a highly accurate long-read sequencing approach, like HiFi sequencing, would be simpler. There are also computational methods (learn about Nighthawk) or the use of population haplotype frequency information to infer phasing.

Overall, phased genomes or variant calls are higher quality than haplotype collapsed versions as they provide allelic information, which can be important for studying human diseases, crop improvement, evolution, and more. HiFi reads, with accuracy high enough to detect SNVs and read lengths to detect these SNVs over many kilobases, generate larger phased haplotype blocks.

 

As scientists analyze more and more genomic data, the role of sequence accuracy will likely only become more important. HiFi reads offer the benefits of high accuracy equivalent to short-read sequencing data, but with the length necessary for complex genome assemblies and phasing of variants across large swaths of the genome.

 


Interested in finding out more about using HiFi readsGet in touch with a PacBio scientist to scope out your project.

 

Explore other posts in the Sequencing 101 series:

The Evolution of DNA Sequencing Tools

Introduction to PacBio Sequencing and the Sequel II System

Why Are Long Reads Important for Studying Viral Genomes?

Looking Beyond the Single Reference Genome to a Pangenome for Every Species

Subscribe for blog updates:

Archives