AGBT Day 3 Highlights: Single Contigs, Dazzling Assemblers, Novel Isoforms & Honey Algorithms
Tuesday, February 18, 2014
Friday morning’s talks were exceptional, and included genomics heavy-hitters Dick McCombie and Gene Myers — both scientists who were truly influential in sequencing the human genome so many years ago. They have kept pushing boundaries, and their talks were fascinating.
Cold Spring Harbor Laboratory’s McCombie offered a presentation based on a late-breaking abstract showing the importance of de novo assembly — rather than resequencing, which can miss structural differences — using SMRT® Sequencing. He showed data from genome sequences of two strains of yeast (S. cerevisiae and S. pombe), both of which were generated using P5-C3 chemistry with BluePippin™ size selection from Sage Science. For the first strain, 15 of 16 chromosomes assembled into single contigs, with the final chromosome represented in two contigs. For S. pombe, one chromosome and the mitochondrial genome came together into individual contigs, while the other two chromosomes were split into two contigs each. McCombie’s team also worked with the Arabidopsis data set released by PacBio and compared it to an Illumina® sequencing-based assembly of the same plant. Contig N50 increased from 65 Kb with the MiSeq® platform to 8.4 Mb with the PacBio® platform. Finally, he showed data from a rice genome sequenced for him by PacBio. (He told attendees he had to contract the project out since his own PacBio RS II was running at capacity.) The mean read length was 10 Kb and the longest read produced was more than 54 Kb, earning McCombie the award for longest read presented at the conference. See the video here.
Gene Myers, who recently joined the Max Planck Institute for Molecular Cell Biology and Genetics in Dresden, Germany, said that PacBio long reads had reinvigorated his excitement about genome assembly with the promise of being able to produce reference-quality genomes. Myers has developed a tool called Dazzler (the Dresden Azzembler, check out the blog here) that significantly accelerates the process of assembling PacBio sequence data. Dazzler works by scrubbing data prior to assembly in order to make the entire process more efficient; Myers reported a comparison of the human genome data set we just released showing a 36-fold speedup over BLASR. The tool can fully assemble an E. coli genome from PacBio reads on a regular laptop in just 10 minutes.
Later in the day, our CSO Jonas Korlach gave a talk showcasing the Iso-Seq™ method for full isoform characterization using SMRT Sequencing. He showed papers from the laboratories of Mike Snyder and Wing Wong, both at Stanford, who used PacBio long reads to fully analyze transcriptomes. Even in well-studied cell lines, Korlach noted, scientists were finding novel transcript isoforms and even novel genes thanks to information provided in these long reads. He also spoke about a metagenomics project looking at a mock human microbiome data set from NIAID, in which SMRT Sequencing was able to fully resolve more than half of the organisms in the community and get the rest into assemblies of a few contigs. The project also resolved all plasmids and yielded methylome data for the microbiome. See the video here.
The evening session on genomic technologies development featured two more PacBio users. David Wheeler from Baylor’s Human Genome Sequencing Center presented sequence data for tumor/normal pairs; his group is generating 10x coverage for tumors and 5x for the matched normal tissue. He focused on structural rearrangements such as tandem duplications and said that many of these elements were driven by the movement of repeat regions around the genome. They could clearly be resolved using the PacBio technology along with two new algorithms from Adam English called Honey-tails and Honey-spots. See Wheeler’s video here. In the other presentation, Sean McGrath from the Genome Institute at Washington University used SMRT Sequencing for gene isoform identification and prediction. His data from a cancer cell line and from a hookworm showed the ability of PacBio sequencing to identify more genes than short-read technologies had been able to identify, and also preserved the 5’ and 3’ UTR information in many cases.