Scientists Produce Valuable New Human Structural Variation Resource Using SMRT Sequencing
Thursday, January 24, 2019
In an effort to produce a comprehensive list of structural variants in the human genome, scientists from the University of Washington, the University of Chicago, Washington University, and Ohio State University sequenced 15 human genomes and have now released the results of their in-depth analysis.
The Cell publication, “Characterizing the Major Structural Variant Alleles of the Human Genome,” comes from lead authors Peter Audano and Arvis Sulovari, senior author Evan Eichler, and collaborators. The data generated by this work “provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity,” the authors report.
The analysis represents remarkable genomic diversity. The team used the PacBio RS II and the Sequel System to produce high-coverage, long-read sequence data for 11 diploid genomes, primarily sourced from HapMap samples and spanning Yoruban, Gambian, Luhya, Han Chinese, Vietnamese, Puerto Rican, Columbian, Peruvian, Telugu, Northwestern Europe, and Finnish ethnicities. They also used existing PacBio genome assemblies for two hydatidiform moles (CHM1 and CHM13) as well as the recently published Korean (AK1) and Chinese (HX1) genomes.
From this wealth of long-read data, the scientists then resolved and annotated nearly 100,000 common structural variants (defined as insertions, deletions, or inversions at least 50 bp long). Of those, more than 2,200 variants were shared by all genomes analyzed and another 13,000 were detected in most genomes — “indicating minor alleles or errors in the reference,” the team notes. Most of the variants were not reported in previous studies that relied on short-read technology. “Importantly, the breakpoints and content of these major alleles are now resolved at the single-base-pair level,” the scientists add, “providing the requisite sequence specificity on a GRCh38 coordinate system to begin to develop not only alternate haplotypes, but also to develop a more comprehensive graph-based assembly representation of the human genome.” The authors also noted that there are more structural variant alleles to discover, estimating that adding 35 more genomes (50 total) would increase the number of alleles by 39%.
The scientists also make clear that this kind of study would not have been possible even a few years ago. “Recent advances in sequencing technology have now allowed us to systematically whole-genome shotgun (WGS) sequence large stretches (>10 kbp) of native DNA without the need to propagate clone inserts in E. coli,” they explain. “This is particularly advantageous for structural variation since the long reads provide the necessary context to anchor and sequence resolve most structural variants (SVs) irrespective of sequence composition.” In addition, the analysis determined that variants were more likely to be found in GC-rich or GC-poor sequences, which means they “were likely problematic to clone, sequence, and assemble using large-insert BAC clones” during the Human Genome Project, the scientists add.
The results of this impressive work now comprise the first database of structural variants in control individuals sequenced with long reads, making it a valuable resource for researchers seeking to discover pathogenic structural variants associated with particular diseases. “The sequences we now add to the human genome provide the necessary substrate to discover new disease associations, especially as they relate to repeat instability,” the authors conclude.
There are more structural variants waiting to be found in human genomes. If you’re interested in related research, use our project calculator to estimate the time and materials needed and to get suggested study designs.