diploid genome Archives

June 1, 2021

FALCON-Phase integrates PacBio and HiC data for de novo assembly, scaffolding and phasing of a diploid Puerto Rican genome (HG00733)

Haplotype-resolved genomes are important for understanding how combinations of variants impact phenotypes. The study of disease, quantitative traits, forensics, and organ donor matching are aided by phased genomes. Phase is commonly resolved using familial data, population-based imputation, or by isolating and sequencing single haplotypes using fosmids, BACs, or haploid tissues. Because these methods can be prohibitively expensive, or samples may not be available, alternative approaches are required. de novo genome assembly with PacBio Single Molecule, Real-Time (SMRT) data produces highly contiguous, accurate assemblies. For non-inbred samples, including humans, the separate resolution of haplotypes results in higher base accuracy and more contiguous assembled sequences. Two primary methods exist for phased diploid genome assembly. The first, TrioCanu requires Illumina data from parents and PacBio data from the offspring. The long reads from the child are partitioned into maternal and paternal bins using parent-specific sequences; the separate PacBio read bins are then assembled, generating two fully phased genomes. An alternative approach (FALCON-Unzip) does not require parental information and separates PacBio reads, during genome assembly, using heterozygous SNPs. The length of haplotype phase blocks in FALCON-Unzip is limited by the magnitude and distribution of heterozygosity, the length of sequence reads, and read coverage. Because of this, FALCON-Unzip contigs typically contain haplotype-switch errors between phase blocks, resulting in primary contig of mixed parental origin. We developed FALCON-Phase, which integrates Hi-C data downstream of FALCON-Unzip to resolve phase switches along contigs. We applied the method to a human (Puerto Rican, HG00733) and non-human genome assemblies and evaluated accuracy using samples with trio data. In a cattle genome, we observe >96% accuracy in phasing when compared to TrioCanu assemblies as well as parental SNPs. For a high-quality PacBio assembly (>90-fold Sequel coverage) of a Puerto Rican individual we scaffolded the FALCON-Phase contigs, and re-phased the contigs creating a de novo scaffolded, phased diploid assembly with chromosome-scale contiguity.

February 5, 2021

Asset Tag: diploid genome

ASHG Virtual Poster: De novo assembly of a diploid Asian genome

DNAnexus Webinar: Simplifying de novo assembly with PacBio tools available on DNAnexus: FALCON

i5K Webinar: High-quality de novo insect genome assemblies using PacBio sequencing

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

AGBT Conference: Personalized phased diploid genomes of the EN-TEx samples

Webinar: Understanding, curating, and analyzing your diploid genome assembly

Webinar: Addressing “NGS Dead Zones” with third generation PacBio sequencing

Webinar: A sketch of assembly recipes for PacBio data

Webinar: Assembling high-quality human reference genomes for global populations

User Group Meeting: The trials and tribulations of high quality human genome assembly

User Group Meeting: FALCON-Phase: Phased diploid assemblies through integration of PacBio and Hi-C data

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert