At SMRT Leiden, Improvements in Characterizing Genomes, Transcriptomes, and Methylomes
Monday, June 19, 2017
Last month, we co-hosted the 2nd annual SMRT Leiden conference with Leiden University Medical Center. SMRT Leiden featured three days of excellent presentations, including one day focused on bioinformatics. If you missed it, we’ve prepared this quick recap to cover the highlights. In addition, several of the presentations are available to download, and you can check out tweets from day 1 and day 2.
The meeting kicked off with a clinical angle: Eric Schadt from the Icahn School of Medicine at Mount Sinai gave a keynote talk about capturing the clinically actionable genome. Noting that we are in an age of data explosion, Schadt presented ideas for how to take advantage of that to improve human health — and ultimately to model individual health trajectories for optimal decision-making in the clinic. At Mount Sinai, Schadt said genetic testing is becoming more comprehensive, citing examples like a pan-ethnic carrier screen and pregnancy-related testing that starts before conception and follows the infant after birth. SMRT Sequencing is important for these efforts because of its excellent accuracy and long reads, which enable phasing variants and resolving complex regions. By combining technologies, Schadt said his team improved carrier screening to deliver meaningful results to more than 60% of patients, compared to fewer than 7% with traditional testing. Schadt’s colleague Robert Sebra also gave a clinical talk, in which he said that the ideal approach will be whole genome sequencing with long reads to capture challenging genes, pseudogenes, and other important but complex elements. While that is not yet practical, he noted that previous efforts in the lab to sequence whole human genomes took a year and 1,000 SMRT Cells on the PacBio RS II; with the Sequel System, that now takes 50 SMRT Cells and can be completed in two weeks.
Two keynote presentations focused on genome evolution. Shinichi Morishita from the University of Tokyo spoke about bacterial metagenomics, for which PacBio sequencing improved the detection rate for mobile elements and methylation motifs. He also works on centromeres, for which he uses PacBio sequencing with the Hi-C method. Jason Underwood from the University of Washington presented the use of long reads to compare apes and humans in order to find elements specific to humans. His team is using SMRT Sequencing to generate high-quality primate genomes, such as the recent Susie3 assembly, and to annotate them. These projects have improved structural variation detection and increased discovery of human-specific events. Underwood said high-quality PacBio assemblies would be available in the next year or two for gibbon, bonobo, and rhesus macaque.
The Max Planck Institute’s Stefan Mundlos kicked off the afternoon with a keynote about using topologically associated domains, CRISPR, and other approaches to elucidate skeletal disease. Following that, several presentations focused on the use of SMRT Sequencing to resolve challenging regions in the human genome. Adam Ameur from Uppsala University is using PacBio sequencing for targeted and whole-genome methods to resolve repeats, low frequency mutations, and more. As part of the Swedish 1000 Genomes Project, his team has sequenced two whole genomes with SMRT Sequencing so far, finding about 20,000 structural variants in each one — 80% of which were missed by short-read sequencing. From NUI Galway, Brian McStay presented on the genomic architecture of regions on human acrocentric chromosomes. These regions are difficult to sequence due to repetitive DNA, but he was able to target and sequence them successfully with NimbleGen capture and SMRT Sequencing. Our own Tyson Clark spoke about using amplification-free targeted enrichment for analyzing genomic regions associated with repeat expansion disorders.
A number of great talks focused on plants, animals, and microbes. Felix Bemm from MPI Tübingen focused on Arabidopsis, in which structural variation was being missed with short-read sequencers. By incorporating PacBio sequencing, his team was able to explore NLR complexity; they also produced 10 platinum-grade genomes for a deep dive into structural variants. The University of Rochester’s Amanda Larracuente is studying Y chromosome dynamics in Drosophila. By adding SMRT Sequencing data to their pipeline, her team improved coverage for elusive Y genes and now have as much as 40% of the Y chromosome in contigs. Wasp parasites captured our attention in a talk from Ken Kraaijeveld at VU Amsterdam. He studied asexual and sexually reproducing parasites to understand the differences in mutation accumulation in their genomes, finding that transposable elements may play a role in reduced recombination.
From the University of Oslo, Ave Tooming-Klunderud spoke about targeted sequence capture in a cod study. Focusing on a 300 kb region of hemoglobin genes, the team analyzed eight species and optimized the sample prep protocol with barcoding, which resulted in using just nine SMRT Cells. Richard Kuo from the University of Edinburgh presented data from using the Iso-Seq method to understand chicken transcriptomes; the approach improved detection of lncRNAs, transcripts that were missed in previous annotations, and splicing diversity. Finally, Thomas Otto from the Wellcome Trust Sanger Institute gave a keynote talk about long-read sequencing of parasite genomes, with a focus on Plasmodium falciparum. Otto noted that the first assembly for this genome cost $18 million (that was back in 2002), and today on the PacBio RS II System it only takes five SMRT Cells. Because the genome has only 19% GC content, SMRT Sequencing is more successful at calling intergenic regions that can’t be mapped using short-read data.
We really enjoyed two talks about immune-related genes. Marvyn Koning from our LUMC host spoke about B cells and the adaptive immune system. Sequencing has been difficult because of the high mutation rate across many locations, but Koning developed a method called ARTISAN PCR to anchor primers in one region that didn’t change. With PacBio sequencing, the approach yields much higher accuracy than short-read sequencing. Julie Karl from the University of Wisconsin-Madison talked about sequencing the complex MHC region in macaques. For this work, SMRT Sequencing has been essential to achieve the accuracy needed for a genomic region that’s even more complex than the human MHC locus.
We were treated to some proteogenomic talks as well. In a keynote presentation, Gloria Sheynkman from the Dana-Farber Cancer Institute spoke about approaches to understand the complexity of splice diversity and the proteins they produce. One method is ORF-seq, which measures the isoforms in various functional groups and relies on SMRT Sequencing to characterize the isoforms. And NKI’s Gosia Komor presented a proteogenomic analysis of alternative splicing for a colorectal cancer biomarker study. With the Iso-Seq method, the team is building up the reference set of isoforms to find those associated with cancer risk.
Finally, our own Lance Hepler offered a look at new applications for SMRT Sequencing, including new software for detection of minor variants and structural variants and multiplexed whole genome sequencing for microbes. The new Juliet tool for characterizing minor variant frequency and pbsv for increased structural variant sensitivity will both be included in the SMRT Link 5, due to be released this summer. Hepler also noted that with the multiplexing protocol a single SMRT Cell on the Sequel System will be able to sequence up to 12 microbes with genomes of ~4.5 Mb; the protocol works for the PacBio RS II System as well.
We are thankful to all of the fantastic speakers who shared their research, for our gracious host Yahya Anvar and the entire LUMC as well as everyone who attended the event. We look forward to seeing you again next year in Leiden!