At ISMB, Gene Myers’ Keynote Offers History, Future of Genome Assembly
Tuesday, July 22, 2014
At ISMB 2014 in Boston earlier this month, Gene Myers of the Max-Planck Institute for Molecular Cell Biology and Genetics, presented a keynote address entitled “DNA Assembly: Past, Present, and Future.” Myers received the prestigious Senior Scientist Accomplishment Award from the International Society for Computational Biology (ISCB) at the event.
The ISCB Senior Scientist Accomplishment Award honors respected leaders in computational biology and bioinformatics for their significant contributions to these fields through research, education, and service. Myers is being honored as the 2014 winner for his outstanding contributions to the bioinformatics community, particularly for his work on sequence comparison algorithms, whole-genome shotgun sequencing methods, and for his recent endeavors in developing software and microscopic devices for bioimage informatics.
His talk chronicled the history of sequence assembly methods highlighting the different technologies from Sanger sequencing to today, and the various algorithmic approaches to the problem, weaving throughout it the ideas of string graphs and de Bruijn graphs.
Myers believes the demand for lower-cost sequencing “after the genome” has hampered progress on the production of high quality de novo genome reconstructions, and resulting instead in ‘swiss cheese genomes’. He said that generating genomes consisting of lots of small contigs was never his vision for assembly.
He spent nearly a decade out of the “DNA sequencing scene” (see his blog post “On Perfect Assembly”) because the cost-over-quality movement caused him to lose interest as a mathematician, until the advent of long-read sequencers renewed Myers’ engagement in assembly methods. He writes: “What I perceived early in 2013 was that the relatively new PacBio ‘long read’ DNA sequencer was reaching sufficient maturity that it could produce data sets that might make this possible, or at least get us much, much closer to the goal of near perfect, reference quality reconstructions of novel genomes.” Myers noted that some in the industry had misunderstood the accuracy profile of the system, but he recognized the power of the Poisson sampling and random distribution of errors and decided last year to purchase a PacBio® RS II and “get back into the genome assembly game.”
Myers now has two PacBio RS II sequencers and, as he has discussed in his blog and presentations this year at AGBT and ISMB, he is not concerned with error rates associated with PacBio sequencing because the error is truly random (“unlike any previous technology”), and therefore “the ideal of near perfect de novo assembly is again possible.”
He described his most recent algorithmic work on an assembler called the Dazzler (the Dresden AZZembLER) that can assemble 1-10 Gb genomes directly from a shotgun, long-read data set produced by PacBio RS II sequencers. Using Dazzler, he reported generating a de novo assembly of a human genome with an N50 of 5.5 Mb, which represents an improvement of over 1 Mb compared to our HGAP assembly in February, and with much reduced computational requirements and time. More information is available on his blog. In conclusion, he noted that long-read sequencers will enable de novo, reference-quality reconstructions, enhance comparative genomics and diversity studies, and give us an accurate picture of large-scale structural variation.
We are glad to see Myers back in the DNA sequencing scene, and very excited about the possibilities SMRT® Sequencing holds for genome assemblies!