AGBT 2015: PacBio Workshop Review & Recording
Friday, February 27, 2015
Our AGBT workshop attracted more than 500 attendees thanks to the high-profile speakers who shared their perspectives on human genomic research. Because of the exclusivity of AGBT, we decided to live-stream our workshop to reach the broader scientific community. Thanks to the the hundreds of people who tuned in to our live webcast from afar! Here are some highlights from the presentations and the recording of the workshop is at the bottom of this post.
Our CEO, Mike Hunkapiller, started the session with a reflection on the 15-year anniversary of the announcements of the first human genomes, noting these efforts required considerable effort and produced draft assemblies with contig N50s in the 20-24 kb range. Many technologies and methods have been introduced since then, but assembly quality has not improved dramatically and scientists are still missing critical genomic information. He noted structural variations, in particular, have been underestimated, limiting our understanding of human genomes. He then unveiled a PacBio® diploid assembly of Craig Venter’s genome, chosen because it has been so well characterized over the years. Compared to the original iteration of the Venter assembly, the PacBio diploid assembly contains 3004 primary contigs, a contig N50 of 10.4 Mb, and the longest contig is 34.6 Mb. The 4,761 associated contigs, representing potential structural variants, total 189 Mb with a mean length of 39.8 kb.
In his first appearance at Marco Island, Venter (accompanied by his dog, Darwin) offered his vision for Human Longevity, Inc. (HLI), as well as at the J. Craig Venter Institute and Synthetic Genomics. At HLI, his team plans to sequence 1 million genomes in the coming years while also gathering extensive phenotypic information to make meaningful connections from the data. To support this effort, they will produce 30 reference genomes representing ethnogeographic diversity. Venter told attendees the PacBio machine gives you a great reference genome.
Next up was Gene Myers from the Max-Planck Institute, who addressed the concept of a near-perfect human assembly. He believes this level of quality is within reach, made possible by the long reads, random error, and random sampling of SMRT® Sequencing. Myers and his team have been working hard to build new analysis tools for processing this data, including a lightning-fast aligner and a scrubbing algorithm. His tools are available through his Dazzler website.
Deanna Church from Personalis spoke about the importance of a complete, truly representative human reference genome. Having this data is necessary for calling and interpreting variants, noting something as simple as a missing gene in the reference can confound other calls in a new reference-based assembly. She championed the new regions of alternative loci available with the GRCh38 human reference, saying this sequence is essential to ensure you’re not missing valuable information in genome interpretation. Church urged attendees to generate high-quality sequence assemblies and contribute them back to the databases to continue refinement of the reference genome.
Jeong-Sun Seo, CEO of Macrogen, Inc., spoke about his team’s efforts to sequence large numbers of Asian genomes and the generation of a representative diploid human genome reference for the Asian population. His team used PacBio technology, PacBio’s latest diploid assembly methods, optical mapping from BioNano, and BAC sequencing to create the most comprehensive genomic reference possible for a Korean human genome sample. He showed examples of gaps that could be closed or extended within the GRC38 reference thanks to SMRT Sequencing data, and highlighted work to identify structural variants, some of which are implicated in diseases that affect Asian populations more than other populations.
Finally, Dick McCombie from Cold Spring Harbor Laboratory presented work on a breast cancer cell line known to be riddled with rearrangements, amplifications, and other complex events. Working with collaborators at OICR, he is using long-read sequencing to generate a higher-resolution view of the structural variation occurring in this cell line. The project, which began last November and is still ongoing, has already led to promising results, such as detecting complex structural variants missed by short-read sequencers. A de novo assembly generated by DNANexus in 22 hours produced an unprecedented contig N50 of 2.56 Mb. Download the raw data from the Schatz lab website.