The first full day of AGBT kicked off with a great talk from Evan Eichler from the University of Washington. Starting with the premise that characterizing genetic variation is key to understanding phenotypes, his presentation offered in-depth looks into human genome projects designed to fully represent data missed in existing assemblies and current whole genome sequencing studies. Eichler pointed out that short-read sequencing misses a lot of structural variation, particularly when it occurs near repeat-rich regions. He said that every genome sequenced with short-read technology is missing important variation, and that a big problem is our inability to quantify just how much is missing. Eichler told attendees that he uses SMRT® Sequencing because it allows direct observation of native DNA, offers long reads, and has very little GC bias. He presented two sequencing projects focused on hydatidiform moles (CHM1 and CHM13), which have haploid human genomes. In one project, he reported detecting 26,015 structural variants, and closing or shrinking of 90 gaps in the human reference genome (many of which included GC-rich sequence) and adding a total of 1.1 Mb of novel sequence. He noted that one of the most important findings of the work was that 92 percent of insertions and 60 percent of deletions found in the genome were novel — including many in protein-coding regions — perhaps indicating how much has been missed in previous human genome population studies. An analysis of STRs found in the SMRT Sequencing-generated assembly showed that they were 3x more abundant and 2.8x longer than STRs in the existing human reference genome, which also suggests that current knowledge is incomplete. (Much of this work was included in this Nature paper from Chaisson et al.)
In a separate project, Eichler’s team compared the information gleaned from SMRT Sequencing of two haploid human genome samples to information obtained through the 1,000 Genomes Project. In the two haploid genomes, he said, they found almost as much structural variation as was found across more than 2,500 diploid human genomes in the public dataset, sequenced using short-read methods. He added that once structural variation is fully catalogued, standard analysis methods can be used to go back and look for those elements in existing human genome data, and resolve about 50% of the SV genotypes. With a fairly small number of human genomes, he said, it may be possible to build a fairly comprehensive view of structural variation in the human genome.
We were also very interested in a presentation from NHGRI Director Eric Green about the recently announced Precision Medicine Initiative. The goals around using genomics to guide targeted treatments mesh nicely with NHGRI’s other efforts to generate a more comprehensive view of human genetic variation and to find the missing heritability in our genomes. Much of this was discussed at a planning session last year, which you can check out in this video. Notably, Evan Eichler also used the term Precision Medicine in his talk, noting that “if you believe in precision medicine, you should want to be comprehensive and precise.”
Later in the day, we particularly enjoyed talks from Sarah Tishkoff at the University of Pennsylvania and David Page at the Whitehead Institute. Tishkoff presented great data from genomic studies of people from remote locations in Africa, such as studies of genetic links to short statures among pygmy people. She urged attendees to support sequencing in ethnically diverse populations to generate many reference genomes that can be used to better understand variation in populations not well represented by existing reference genomes. Page’s talk focused on sequencing the X and Y chromosomes, which he called “the genome’s most challenging substrates.” For instance, the human Y chromosome features eight palindromes, the largest of which is nearly 3 Mb; in mouse, the major challenge with this chromosome is 180 copies of a 500 kb repeat unit. Page used BAC clones and an iterative sequencing approach known as SHIMS to characterize these complex regions, which are largely absent from current reference assemblies.
At noon today we’ll be hosting our workshop, “Toward Comprehensive Genomics — Past, Present and Future.” Check back soon for the live-stream video!