The PacBio workshop at ASHG 2015 featured talks from two leaders in human genomics, Rick Wilson of Washington University and Richard Gibbs from Baylor University. Mike Hunkapiller, CEO of Pacific Biosciences, opened the workshop with a historical perspective of human genome sequencing, starting with the Human Genome Project. While advances have been made in technology, throughput and cost reductions, the quality of genomes hasn’t kept pace with decreases in cost, he noted. This is why Hunkapiller was particularly proud to share the news of the company’s launch of the Sequel System – which offers SMRT Sequencing and long reads at seven times greater throughput over the PacBio RS II and roughly half the cost, making it feasible to use the system for de novo assembly of high-quality human genomes. He also stated that the platform has the capacity to scale over time to handle increasingly higher-density SMRT Cells, pointing toward a future where de novo human genomes will become both practical and routine.
Rick Wilson titled his talk “Of reference genomes and precious metals” and walked the audience through definitions and standards for the various quality levels for de novo assembled human genomes, e.g., platinum, gold, and silver. He noted that this was a good topic for this session because of the important role PacBio has played in the community’s work to create reference-grade genomes. For example, PacBio technology has enabled them to sequence additional genomes (CHM1, CHM13) to a very high quality level. Although these sequences were essential for further refining the GRCh38 reference build, he stated that the current reference genome is still not optimal for some highly polymorphic and complex regions of the genome, and does not adequately represent diverse ancestries sufficiently.
Wilson outlined their definition of a ‘gold’ genome as a high-quality, highly contiguous representation of the genome with haplotype resolution of critical regions – created with PacBio reads to perform de novo assembly, a scaffold created using BioNano and/or Dovetail aligned to reference, and BACs to fill targeted regions and shore up gaps. The list of gold genomes in progress includes the Yuroban, Puerto Rican Han Chinese, CEU, and Luhya. A ‘platinum’ genome is a contiguous, haplotype-resolved representation of the entire genome, two of which currently exist for the CHM1 and CHM13 hydatidiform moles. While ‘silver’ definition standards are to be determined, this category is generally non-trio genomes produced with PacBio and BioNano mapping, and no BAC library.
Richard Gibbs talked about the transition to genomic medicine, which hasn’t been as simple as people would like due to such issues as the incomplete reference genome, the difficulty in characterizing some variation, and the lack of knowledge about the function of some genes. At Baylor, most of the human genome sequencing is done for children with Mendelian disorders. He said that among 7,000 samples processed using short-read exome sequencing, only about 25% of these cases are solved. The relatively low diagnosis rate is likely due to structural variation and other regions not captured by short reads.
He discussed some ways to get to structural variation including PacBio sequencing and PBJelly and Parliament analysis routines, using as little as 10-fold PacBio coverage. Using these methods they are closing gaps in the genomes of various species, for example – he noted that in the sheep genome they have closed 70% of gaps with PacBio reads. He also mentioned the use of PBHoney to identify inconsistencies between reads and the reference, and that long-range capture strategies using a combination of Nimblegen and PacBio are ‘going beautifully so far.’
To close the workshop, Jonas Korlach, Chief Scientific Officer at PacBio, built on Hunkapiller’s comments by talking about the technology waves that have followed the initial human genome sequencing project, where we are today, and where we are going.
Today, we are in what Korlach calls the 4th wave, where more comprehensive whole-genome re-sequencing is occurring, and we are nearing the 5th, when we will actually be able to free ourselves from reference genomes and sequence everything de novo.
Korlach also touched on some of the new developments PacBio is working on, which include amplification-free target enrichment methods, using Cas9 enzyme for targeting, and sequencing native DNA. Other progress will come through the ability to use PacBio sequencing to phase alleles and more comprehensively capture all size and types of variants into haplotigs (contiguous haplotype-sequence blocks). Barcoding samples for isoform (Iso-Seq) sequencing and allele-specific methylation analyses are also in the works.
Watch the recording of the entire workshop session.