With the increasing availability of whole-genome sequencing, haplotype reconstruction of individual genomes, or haplotype assembly, remains unsolved. Like the de novo genome assembly problem, haplotype assembly is greatly simplified by having more long-range information. The Targeted Locus Amplification (TLA) technology from Cergentis has the unique capability of targeting a specific region of the genome using a single primer pair and yielding ~2 kb DNA circles that are comprised of ~500 bp fragments. Fragments from the same circle come from the same haplotype and follow an exponential decay in distance from the target region, with a span that reaches the multi-megabase range. Here, we apply TLA to the BRCA1 gene on NA12878 and then sequence the resulting 2 kb circles on a PacBio RS II. The multiple fragments per circle were iteratively mapped to hg19 and then haplotype assembled using HAPCUT. We show that the 80 kb length of BRCA1 is represented by a single haplotype block, which was validated against GIAB data. We then explored chromosomal-scale haplotype assembly by combining these data with whole genome shotgun PacBio long reads, and demonstrate haplotype blocks approaching the length of chromosome 17 on which BRCA1 lies. Finally, by performing TLA without the amplification step and size selecting for reads >5 kb to maximize the number of fragments per read, we target whole genome haplotype assembly across all chromosomes.
June 1, 2021 |