AGBT 2017, Day 2: Diploid Genomes, Deep Learning, and the PacBio Workshop
Thursday, February 16, 2017
The second day of AGBT featured a number of great talks and posters, and also our user workshop called “Covering All the Bases with SMRT Sequencing.” We’d like to thank the hundreds of attendees who crowded into the room for this event!
The workshop kicked off with Nezih Cereb, CEO of Histogenetics, who spoke about using long-read PacBio sequencing for typing HLA class I and II genes, which are important for applications such as matching organ transplants to recipients. The company has been performing industrial-scale SMRT Sequencing since it first acquired its PacBio RS II instrument, but recently increased capacity further by adding the Sequel System. Histogenetics types thousands of HLA samples each day with these instruments, and Cereb noted that SMRT Sequencing is essential for its ability to phase mutations in the HLA alleles. This layer of information cannot be accessed with short-read or Sanger technologies but is critical for understanding an individual’s immune function. Cereb told attendees that the Sequel System has performed so well that his company acquired three more of these sequencers to boost HLA typing throughput and allow new investigations into other complex regions, such as KIR. He concluded by saying that sequencing the full HLA genes is now the gold standard for typing samples.
Next up was Margaret Roy from Calico Life Sciences presenting results of a de novo genome sequence for the naked mole rat. The rodent has a remarkably long life span and resistance to cancer, both of which make it an appealing model to the Calico team. There were two existing assemblies for it, but both had been done with short-read sequencing and were highly fragmented. Roy and her team used SMRT Sequencing to collect libraries with fragments of at least 25 kb and 45 kb and conducted sequencing on both the PacBio RS II and Sequel Systems. While the assembly is not yet complete, Roy told attendees that its metrics look good: the 2.5 Gb genome is represented in just 493 contigs, with the largest contig covering 71 Mb. The team is working to add scaffolding data from BioNano Genomics and will integrate additional data sets in the near future to achieve a high-quality final assembly for annotation. Roy said that the Sequel has been a welcome addition for the project, because lab members can load a tenth of the library onto a SMRT Cell and get five times the amount of data they would have with the PacBio RS II system. Once the project is complete, Roy said, she anticipates publishing the genome and releasing it publicly.
The final workshop speaker was our CSO, Jonas Korlach, who offered a look at where the Sequel System is currently and future improvements in the works. He showed a map of PacBio sequencer installations, noting that there are now about as many Sequel Systems in labs as PacBio RS II systems. He also reviewed some exciting applications of SMRT Sequencing, including shotgun metagenomics, human de novo assemblies, Iso-Seq analysis, and more. Looking ahead, Korlach said users can expect the Sequel System throughput to double this year and again next year, followed by a new SMRT Cell with eight times the number of zero-mode waveguides by the end of 2018. In total, this will enable a 30-fold increase of throughput, which should make it possible to complete a de novo human assembly for about $1,000. For only structural variation coverage, the cost could be as little as $200 per person.
In other conference talks during the day, Emma Teeling from the University College Dublin made a compelling case for her unique study of bats. These organisms have not been well represented by the genomics community, but she expressed hope that it would be possible in the not-too-distant future to achieve chromosome-level assemblies for bats using long-read sequencing and other advanced technologies. Separately, Mark DePristo from Google’s Verily Life Sciences unit presented results of a deep learning tool trained to spot variants from images of sequence reads. DeepVariant, which won an award for accurate SNP calling in the PrecisionFDA competition, has been used to call variants in PacBio data with excellent results; DePristo noted that it’s one of the few diploid variant callers available.
In one of the last talks of the day, Mike Schatz from Johns Hopkins University and Cold Spring Harbor Laboratory shared results of sequencing, assembling, and analyzing personalized, phased diploid genomes with Illumina, 10x, and PacBio data. The PacBio and 10x assemblies were most contiguous, but Schatz pointed out that the 10x assembly had many unknown bases, where the PacBio assembly was made up of complete contigs. Those platforms also led to more structural variant calls than the short-read data, but the 10x approach was not able to detect the range of variants that SMRT Sequencing could, missing long insertions and other events. Schatz reported a large and unexpected number of translocations identified with PacBio data, noting that follow-up studies confirmed they were real. He also said that SMRT Sequencing data has the best concordance, outperforming both Illumina and 10x results. His talk really got the audience excited about the power of using personalized diploid genomes to mine for structural variation and understand its effects on regulation.
It’s hard to believe there’s only one day left. We’re already wearing down but eager to see what else AGBT has in store for us!