Accurate long-read sequencing characterizes the full spectrum of genetic variation across the genome, but variant calling software is still catching up to the sequencing technologies. We have generated deep Pacific Biosciences (PacBio) high-fidelity (HiFi), ultra-long Oxford Nanopore Technologies (ONT), Strand-seq, and Illumina whole-genome sequencing data to construct near-T2T, phased genome assemblies from primary material obtained from a 4-generation, 28-member CEPH pedigree (1463). We are constructing a more comprehensive and validated catalog of greater than 8 million single-nucleotide variants, indels, short tandem repeats, and structural variants, including a detailed assessment of inversion polymorphisms that associate with disease risk. The use of multiple orthogonal technologies, near-T2T phased-genome assemblies, and a multi-generation family allow us to assess inheritance patterns and to create a “truth set” for all classes of human genetic variation upon which to test and benchmark new technologies.
November 21, 2023 | Educational video