In September we were excited to have 100+ customers gather in Palo Alto, Calif., to discuss their use of Single Molecule, Real-Time (SMRT®) Sequencing and hear about what’s next for the PacBio® RS II. Many thanks to all the scientists who attended and shared their experiences. For anyone who couldn’t make it, we’ve included some highlights from each talk below (and links to full presentations when possible):
Chongyuan Luo from the Ecker lab at the Salk Institute for Biological Studies spoke about studying the genome and epigenome of several Arabidopsis thaliana strains using SMRT Sequencing. Luo noted that Arabidopsis is the only plant to have its complete genome sequenced, though structural variation has not yet been well characterized. PacBio sequence data detected 40 percent more SNPs than short-read technology, indicating that some regions may not have been covered well enough with short reads to find all SNPs.
We were delighted to have two speakers from the Joint Genome Institute, which has become a power user of the PacBio technology. Alex Copeland offered an overview of the institute’s microbial and fungal reference assembly pipeline, where de novo genome sequencing is especially important. He described their experience with a 10x increase in read length and total throughput in three years on the PacBio platform. He also discussed the evolution of their pipeline from Sanger sequencing to the Illumina® and PacBio platforms, going from a median of 49 contigs per microbial genome with Sanger, to 69 with Illumina sequencing, and to 10 or fewer with the PacBio system. Copeland noted that after producing 100x coverage of long reads (10 kb inserts), PacBio users can reliably assemble a genome into 10 contigs or fewer. He said that the team has shifted to a PacBio-only pipeline, and that they are finishing genomes on the platform for less than $2,000.
Fellow JGI scientist Matthew Blow spoke next on bacterial epigenomics, an important genome component that his team looks at with every microbe sequenced. Blow and his colleagues are studying methyltransferases, their link to restriction enzymes, related sequence motifs, and sites that remain unmodified. A recent analysis of global patterns in DNA modifications in bacteria revealed that of 198 analyzed genomes, 169 (>90%) had modified DNA bases, with the most common being N6-methyladenine (80%). Novel motifs constituted ~20%, and the average number of modified motifs per genome was 3, with a maximum of 12. Blow noted that JGI is seeking collaborators for additional projects to explore the biological functions of DNA modifications.
Bart Weimer from the UC Davis School of Veterinary Medicine spoke about the 100K Foodborne Pathogen Genomes project. He noted that sequencing is critical for pathogen identification both because microbial evolution can erase the markers currently used for tracking, and because 16S classification does not correlate with phylogenetic serotype clustering. The goal of the 100K project is to provide a useful, comprehensive database that will allow users to find clinically relevant information about new strains in outbreak situations. Weimer was enthusiastic about the additional information provided by PacBio sequence data, such as methylation and phage elements — both useful in tracking and identifying pathogens. “I get the sequence, I get the structural variation, I get the SNPs and most importantly I get the epigenetic information. Sequencing is almost a byproduct,” he exclaimed.
Lance Hepler from UC San Diego’s Center for AIDS Research used the PacBio RS to study intra-host diversity in HIV-1. He compared PacBio’s performance to that of 454® sequencer, the platform he and his team previously used. Hepler noted that in general, there was strong agreement between the platforms; where results differed, he said that PacBio data had significantly better reproducibility and accuracy. “PacBio does not suffer from local coverage loss post-processing, whereas 454 has homopolymer problems,” he noted. Hepler said they are moving away from using 454 in favor of the PacBio system.
From Washington University in St. Louis, George Weinstock discussed his overall approach to human microbiome projects, including both targeted 16S sequencing with the PacBio platform, as well as shotgun sequencing of the whole sample In a pilot project, Weinstock’s team created a mock microbiome of 24 samples with a 300-fold range of concentration; PacBio sequencing was able to accurately identify the taxa for all 22 species where 16S amplification succeeded, yielding highly accurate full-length 16S consensus sequences. He also presented a proof of concept study wherein the PacBio system outperformed Sanger sequencing in using full-length 16S sequencing for high-throughput identification of bacteria in clinical isolates of hospital acquired infections.
We had a couple of talks on characterizing complex genomic regions. Lisbeth Guethlein from Stanford University School of Medicine looked at highly repetitive and variable regions of the orangutan genome. Guethlein reported that “PacBio managed to accomplish in a week what I have been working on for a couple years” (with Sanger sequencing), and the results were concordant. “Long story short, I was a happy customer.” In a separate presentation, John Huddleston from the University of Washington discussed sequencing challenging regions in the human genome, noting that assembly accuracy needs to be quite high to resolve breakpoints and reconstruct duplication architectures. His team is working with BACs to validate the use of the PacBio platform as a faster, more cost-effective alternative to Sanger. In one study, his team found that PacBio results had 99.994% identity with Sanger results and showed uniform coverage across the clone.
In the afternoon, talks turned to the transcriptome. Vince Magrini from the Genome Institute at Washington University described a proof-of-principle RNA-seq study using SMRT Sequencing in a nematode to help elucidate transcriptional regulation and its effect on life cycle. Using PacBio data added more than 1,500 genes to what had been found in the reference sequence. In another talk, Alisha Holloway from the Gladstone Institutes presented data from transcript identification work in chicken. Because she uses chicken to model human heart development, she needs good annotations of RNA produced at various developmental stages to figure out where problems arise. Unlike short-read technologies, PacBio sequencing provided reads long enough to span entire transcripts and dramatically improved gene annotation. Finally, Kin Fai Au from Stanford University spoke about gene isoform identification and prediction in embryonic stem cells, commenting that long reads are essential to examining these long regions and resolving alternative splice isoforms.
Robert Sebra from the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai presented data on how to use BluePippin™ size selection from Sage Science to increase subread lengths of PacBio data He noted that the BluePippin sizing step also cleans up DNA quality, compensating for any drop in yield. With size selection, Sebra said that his team could generate microbial assemblies from a single SMRT Cell; without the step, more sequencing was needed.
Two of the speakers came from PacBio: Senior VP Kevin Corcoran and CSO Jonas Korlach. Corcoran updated attendees about the latest on our sequencing platform, including upcoming advances such as polymerase photodamage protection, the P5-C3 chemistry offering 8,500-base average reads, three-hour movies, Quiver for diploid sequencing, and more. In the closing presentation, Korlach spoke about where the PacBio platform is heading, including use for large customer projects that include large numbers of samples, higher complexity metagenomic studies, and assemblies of larger genomes. He also mentioned upcoming technology improvements, such as library prep automation and new data analysis algorithms.