The Festival of Genomics Review: A Celebration of Long Reads
Thursday, July 9, 2015
At the inaugural Festival of Genomics event in Boston, more than 1,500 people turned out to see what was billed as a conference unlike any other. The meeting was indeed unique, featuring a play (starring well-known scientists), a giant chess board, and a Genome Dome, in addition to the more familiar lineup of excellent speakers and workshops.
To help kick off the festival, genomic luminaries Craig Venter and James Lupski presented plenary talks on day 1 and set the stage for some exciting science to follow. Lupski’s talk was particularly impactful, as he described how his team at Baylor recently sequenced his own personal genome using 10-fold PacBio® long-read coverage to analyze copy number changes underlying his rare genomic disorder.
|Plenary session at the Festival of Genomics in Boston, 2015|
Naturally, our favorite part was the dedicated track on long-read sequencing, chaired by our very own CSO Jonas Korlach. This track turned out to be the most popular session of the festival, with standing-room-only attendance. The impressive speaker lineup included Chad Nusbaum of the Broad Institute, Mike Snyder from Stanford, Mark Gerstein of Yale University, Dick McCombie from Cold Spring Harbor Laboratory, Somasekar Seshagiri from Genentech, William LaRochelle from Roche, and Sergey Koren from NBACC. Each speaker detailed the unique value of long-read sequencing for a wide variety of applications, including human genome de novo assembly, structural variant sequencing, full-length transcript profiling, cancer genome assembly, pseudogene analysis, and more.
In a panel discussion with lots of audience participation, Korlach was joined by McCombie, Gerstein, and Snyder for what turned out to be a wide-ranging conversation about the utility of long-read sequencing, and the impact longer reads will have as they gain greater adoption.
The audience was particularly interested in learning about what they might be missing with short-read data. Gerstein said that some of the most interesting parts of genomes are repeats, and that long reads are helpful in elucidating those regions, while Snyder noted that trinucleotide repeats are not reliably found by short reads. The panel also pointed out pseudogenes as an element that will be better viewed with long reads. Gerstein noted that if a type of sequence can’t be seen with short reads, increasing genome coverage doesn’t matter. He encouraged attendees to integrate long-read data whenever possible, adding that even a modest amount of long-read coverage can be very valuable.
Snyder reported that the main source of whole genome sequencing errors is mismapping of short reads. “There’s no question that long reads will help,” he said. He also cited extreme GC regions and homopolymers as elements that can be challenging to represent accurately with short-read sequence data.
Genome variants were another popular topic. The panelists agreed that long reads are key to tracking structural variants in genomes, with Gerstein suggesting that structural variation could help explain some of the “missing heritability” in the human genome. He predicted that many SNPs will turn out to be markers of a structural variant in linkage disequilibrium, rather than being causal elements on their own. McCombie said that while we still don’t know how much medical relevance this variation will have, he finds the subject “intriguing” and clearly worth further study.
There was a good deal of discussion about whole genome sequencing for humans, with Snyder envisioning a future where the concept of reference genomes is outdated because everyone will be his or her own personal reference. Until that’s a reality, Gerstein pointed out the value of generating more reference genomes to better represent common structural variation across a number of distinct ethnic populations. McCombie noted that as we sequence more people, it’s of great importance to make sure we think carefully about consent forms to maximize the value of all that data for lots of different uses in the future.
The conversation also included RNA as Snyder highlighted the underuse of transcriptome data for clinical research. He argued that genomes plus transcriptomes (and eventually microbiomes and methylomes, as well) will be the best way to put together a comprehensive picture of human health and disease. He noted that in cancer studies, his team produces a transcriptome sequence along with a genome sequence for each project.
Below are a few takeaway points from other speakers in the long-read track:
- Nusbaum presented a single-contig assembly of M. tuberculosis, which has a genome with 66 percent GC content. “It’s a perfect assembly,” he said.
- Koren facetiously used a mathematical proof to support his theorem that “long reads solve everything” before presenting his MHAP algorithm and the human genome assemblies it produced. In his latest assemblies, the largest contigs were approaching the size of individual chromosome arms, which further proved his point.
- Seshagiri from Genentech told attendees he uses SMRT® Sequencing for genome, transcriptome, and epigenome characterization, which will be especially important in cancer research.
- McCombie said that long reads offer consistent results, while analyzing genomes “like we used to” misses tens of thousands of variants.
In addition to lining up the fantastic science at the festival, the organizers offered a unique opportunity to support Greenwood Genetic Center by participating in a ‘Race the Helix’ event onsite. Our PacBio runners joined with Sage Science to tackle the treadmill on the show floor, finishing second in distance covered and winning the coveted ‘Best Dressed’ award.
We look forward to the next Festival of Genomics, taking place Nov. 3 – 5 just up the road from us in San Mateo, CA.