Pre-AGBT Workshop: Reference Genomes and Diversity Representation
Tuesday, February 20, 2018
We can’t resist a good reference genome, so the pre-AGBT workshop entitled “Updating Reference Assemblies: New Technologies, New Sources of Diversity” was right up our alley. Hosted by the McDonnell Genome Institute, a member of the Genome Reference Consortium, the event offered conference attendees useful updates on efforts to expand the diversity of human reference genome sequences by incorporating samples from multiple continents of origin (the Americas, Africa, and Asia in addition to Europe).
NCBI’s Valerie Schneider spoke about opportunities and challenges in mining assemblies other than the current GRCh38 build. There are more human genome assemblies than ever, she said, noting that this is providing new insight into where variants are most commonly found — and also helps focus efforts to represent additional diversity. She also covered recent improvements to the GRCh38 assembly, plus a list of remaining technical challenges, while reporting that 65 new human genomes have been submitted to GenBank since GRCh38 was published. Most of those are based on PacBio data, and Schneider spoke about how those assemblies are used to help understand alternate loci and genetic variants in GRCh38. Going forward, she indicated that assemblies from people of African descent are still needed, offering a major opportunity for improvement.
Tina Graves Lindsay from the McDonnell Genome Institute continued the diversity theme, showing how her team relies on a strategy of 60-fold coverage with PacBio long reads paired with scaffolding technologies to produce reference-grade assemblies. By sequencing genomes from underrepresented ethnicities, including Gambian and Yoruban assemblies she shared, her group has successfully resolved conflicts in GRCh38.
Ed Green from the University of California, Santa Cruz, spoke about updating reference genomes with proximity ligation techniques such as Hi-C and Chicago. The approaches are analogous to mate-pair data, he said, and talked about data from 12 diploid human genomes. In one example, proximity ligation showed alignment errors in NA19240, a reference just submitted to GenBank that had sections of chromosome 4 incorrectly placed on chromosome 1, among other problems.
We’d like to thank the workshop organizers for a great event!