Adaptive Selection of CNVs: UW Team Applies SMRT Sequencing to the Melanesian Genome
Friday, October 18, 2019
In a new Science publication, researchers from the University of Washington and other institutions report detailed analyses revealing the adaptive importance of copy number variants (CNVs) acquired from Denisovan and Neanderthal ancestors, the closest relatives of modern humans, in the modern-day Melanesian population. The team used PacBio long-read sequencing to study these complex stretches of DNA and the Iso-Seq method to generate full-length transcript data.
“Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes” comes from lead author PingHsun Hsieh (@phhBenson), senior author Evan Eichler, and collaborators. For the project, they focused on the Melanesians, an oceanic population that are known to have more Denisovan and Neanderthal ancestry than other groups. This made an excellent foundation for studying the role of CNVs in adaptation and archaic introgression.
“Relatively little is known about the extent to which CNVs contribute to the genetic basis of local adaptation and, more importantly, whether CNVs introgressed from other hominins may have been targets of adaptive selection,” the authors write.
As part of this project, scientists focused on “two of the largest and most complex” CNVs found in the Melanesian genome — a 5 kb duplication and a 73.5 kb duplication, both on chromosome 16 — for a deeper investigation. “Both events are largely restricted to Melanesians and the Denisovan archaic genome F2 and are thought to be involved in a single >225-kb complex duplication (DUP16p12) introgressed from the Denisovan genome,” they report. “This region has been difficult to correctly sequence and assemble, and only recently has the sequence structure of the ancestral locus (>1.1 Mb) been correctly resolved.”
To better understand the original duplication, the team generated 75-fold whole-genome coverage of a Melanesian individual using SMRT Sequencing. This allowed them to narrow down the insertion location to a 200 kb region that is enriched in segmental duplication that “predisposes the region to recurrent structural rearrangements associated with autism and developmental delay,” Hsieh et al. write.
By applying the Segmental Duplication Assembler, a methodology recently published in Nature Methods, they wound up with a 1.8 Mb contig including the correctly assembled Melanesian duplication. “Notably, the sequence-resolved assembly shows that the actual length of DUP16p12 duplication polymorphism is ~383 kb, which is longer than previously thought,” the authors report. “Sequence and phylogenetic analyses suggest that the variant originated from a series of complex structural changes involving duplication, deletion, and inversion events ~0.5 to 2.5 million years ago within the Denisovan ancestral lineage.” That duplication was inserted into the Denisovan genome within the last 200,000 to 500,000 years and subsequently introgressed into the ancestors of Melanesians between 60,000 to 170,000 years ago, the authors conclude.
The team performed Iso-Seq with hybridization capture probes toward this region to produce full-length gene models and better characterize the functional effects of CNVs in the Melanesian genome. Based on their results — including a comparison to gene models from other humans and the chimpanzee — the scientists found that the 383 kb duplication is likely adaptive. “This helps to explain why this polymorphism has become nearly fixed within the Melanesian populations (>80%) despite its large size, which is typically regarded as selectively disadvantageous,” they note. “Notably, the Melanesian-specific gene NPIPB shows ~3% amino acid divergence and evidence of positive selection despite its recent origin.” The scientists predict that the proximity of this duplication to a genomic region associated with autism (chr16p11.2) will have an impact on the frequency of autism-associated rearrangements in the Melanesian population.
Based on these results and other data confirming Neanderthal-origin CNVs in the Melanesian genome, the scientists were able to “reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations,” they write. “This study highlights the substantial large-scale genetic variation that remains to be characterized in the human population and the need for development of additional reference genomes that better capture the diversity of our species and complete our understanding of human genes.”
We caught up with Hsieh at ASHG 2019, where he was presenting a poster on this research. He summarized the project by stating, “The high-quality, long-read sequencing data opens up an unprecedented venue to study variants in complex genomic regions. The ability to access these new variants helps us advance our understanding of the biology and evolution of our own species.”