Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.


Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.


You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences' rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences

Sequencing 101: Ploidy, Haplotypes, and Phasing – How to Get More from Your Sequencing Data

Thursday, December 10, 2020

The ploidy or number of copies of each chromosome in a genome affects not only the size but also the complexity of the genome.

Geneticists often point out that a human does not have “a” genome but rather two genomes, one inherited from the mother and another from the father. The number of complete sets of chromosomes in each cell, or haplotypes, is referred to as ploidy.  Humans and most other animals are diploid (2N), having two sets. Many plants have higher ploidy, for example, the hexaploid (6N) California Redwood has 6 copies of each chromosome.

The number of chromosome pairs not only increases the total amount of DNA in a genome, but it also increases the complexity of the genome – by increasing the number of alleles, or alternate forms of genes. Although the majority of the sequence between paired chromosomes are identical, it’s the differences that provide the breadth of biological variation within a species.


Phasing Haplotypes to Get a Complete Picture of Genetic Variation

Whether sequencing a giant polyploid or diploid, the goal remains the same – to get a complete and accurate representation of each copy of the genome or region of interest. This is often achieved by assembling a haploid (single copy) genome and then identifying variants, locations where the alleles differ. Many well-studied organisms, like humans, have standard haploid references against which other individuals are compared.

But identifying variants does not provide the complete sequence of the genome. That requires phasing, or determining which variants are from the same copy of a chromosome (in “cis”) and which are from different copies (in “trans”).  One approach to phasing is to use mother-father-child trios: variants in the child’s genome that that are only present in one parent must be on the same chromosome. A second approach is population inference, which deduces that variants often seen in the same people are likely in phase. Both trio and population phasing are imperfect, as they require additional information and are only able to phase some variants.

Phasing involves separating maternally and paternally inherited copies of each chromosome into haplotypes to get a complete picture of genetic variation.


Recent advances in DNA sequencing technology and the tools used to assemble and phase genomes allow large blocks of the sequence to be phased directly from DNA sequencing reads of one individual. Highly accurate long reads, known as HiFi reads, are uniquely suited to phasing haplotypes as they provide the high accuracy needed to detect single nucleotide variants (SNVs) and the read length to connect these variants over a long range.

Using HiFi reads, either alone or in combination with other technologies like Hi-C and Strand-seq, scientists have been able to produce phased genome assemblies of the rose – a complex tetraploid; the California redwood; and humans, including on of Puerto Rican decent, and one of Korean decent, and a cognitively healthy supercentenarian. The phased genomes have each provided novel insights into functionally important variants.

Phasing Genes to Identify Allelic Configuration of Variants

Phasing of breast cancer tumors revealed allelic configuration that impacts treatment response. Vasan et al. (2019) Science.

Scientists analyzing variants in the PIK3CA oncogene found a compound mutation — a double mutation that appears to give breast cancer patients an overwhelmingly positive response to the targeted PI3Kα inhibitor alpelisib. By sequencing and phasing the entire gene, the researchers were able to show that having both variants on the same allele (cis) led to a super-responder phenotype; when those variants were on separate alleles (trans), that was not the case. This information will have clinical relevance for many cancer patients and would never have been known without the ability to phase sequence data.

For recessive disease genes, it also is critical to know whether two variants seen in a gene are in trans (thus breaking both copies of a gene) or cis (thus leaving one copy intact).  For example, in the case of a 9-year-old boy with multiple types of cancer, phasing of the MSH6 gene revealed that both maternal and paternal alleles carried mutations resulting in constitutive mismatch repair deficiency syndrome.

Haplotype Phasing to Explore the Genetic Origins of Species

The genetic distance between cultivated apple and wild progenitors identified a large portion of the Gala genome as hybrid in origin. Sun et al. (2020) Nature Genetics.

Researchers exploring apple domestication used haplotype-resolved assemblies of cultivated and wild species to better understand the genetic history of the crop. They were able to sequence and assemble full “haplomes” (haploid genomes) and showed high levels of heterozygosity with more than 20% of the Gala apple genome containing alleles derived from different wild progenitors, showing the Gala was hybrid in origin. Further, they found that introgression of new genes and alleles was a critical component to the domestication of the cultivar. This information provides better understanding of trait variability and will assist in efforts to breed for desirable traits like fruit weight and sweetness.

Allele Phasing to Resolve Variants Missed by Short Reads

Long reads enable detection and phasing of an allele missed by short-read sequencing. Botton, et al. (2020) Genes.

Scientists assessing the role of the promoter of the SLC6A4 gene that is thought to play a role in psychiatric disorder susceptibility, found long-read sequencing critical for interrogating a low-complexity repeat region. The length of a repeat in the gene’s promoter affects gene expression levels. Phasing the repeat length with variants in the coding region of the gene indicates whether a coding variant will have high or low expression. The authors found the repeat region was missed by short read approaches; long-read sequencing both characterized the repeat and unambiguously phased clinically significant variants that may improve pharmacogenetic testing.


How to Obtain Phasing Information with HiFi Reads?

Now that you’ve seen how phasing can provide valuable insights, here is how to obtain phasing information:

  1. Sequence an individual with HiFi reads, which have the accuracy needed to resolve differences and the long read length to phase large haplotype blocks.
  2. Use a diploid-aware assembler like IPA, hifiasm, or HiCanu for genome assembly.
  3. Detect variants with an accurate variant caller like Google Deep Variant and then phase haplotypes with WhatHap.
  4. Combine HiFi data with additional technologies to extend haplotype phasing to the chromosome scale. HiFi data in combination with Hi-C or Strand-seq can phase entire genomes. If a family trio sample is available, short read data from the parents can be used to separate HiFi reads into parental bins before genome assembly (HiCanu, or during genome assembly).



To learn more about how phasing could make a difference for your research contact a PacBio scientist to get started with your next sequencing project.


Explore other posts in the Sequencing 101 series:

The Evolution of DNA Sequencing Tools

Introduction to PacBio Sequencing and the Sequel II System

Why Are Long Reads Important for Studying Viral Genomes?

What’s the Value of Sequencing Full-length RNA Transcripts?

Looking Beyond the Single Reference Genome to a Pangenome for Every Species

Understanding Accuracy in DNA Sequencing

From DNA to Discovery – The Steps of SMRT Sequencing

Subscribe for blog updates: