ASHG 2015: Highlights from Icahn Institute, UW, Stanford & CSHL Presentations
Tuesday, October 13, 2015
During the Wednesday afternoon sessions of last week’s ASHG conference, several speakers provided helpful insights about their use of SMRT Sequencing for a range of applications. Highlights included the following:
Yao Yang, a researcher at the Icahn School of Medicine at Mount Sinai, discussed the development of an assay to genotype the CYP2D6 gene to inform drug dosing in patients. CYP2D6 metabolizes 20-25% of all medications, including antidepressants, anti-psychotics, and opiates. There are more than 100 known variants, which include gene deletions and duplications. Variants can have profound impacts on how patients metabolize drugs, with some individuals being ultra-rapid metabolizers and others being poor drug metabolizers. The development of a simple and reliable typing assay has been challenging because CYP2D6 has a highly homologous pseudogene. Yang and his collaborators developed a targeted full-length PCR protocol to amplify both the gene and pseudogene, as well as companion bioinformatics tools to remove random errors (ALEC) and predict the expected phenotype from genotype data (CYP VCF Translator). He shared results showing that the assay is highly reproducible and capable of recapitulating known genotypes in well characterized samples like NA12878. Furthermore, they uncovered novel CYP2D6 alleles in NA16688 and ASIAN048, which had variants in an intron and exon, respectively. Most impressively, they were able to resolve alleles in NA17084 and other samples in which results from other technologies were discrepant. They foresee using this same approach to develop targeted assays for other similarly challenging genes, and eventually combining these into a multiplexed gene panel of clinically important genes.
Stuart Cantsilieris from the Eichler lab at University of Washington presented work demonstrating the utility of long-read sequencing for understanding the range of structural variant alleles present in the Complement Factor H Gene cluster. The CFHR locus is a well-known hotspot for structural variation, but short-read data has, at best, only provided a rough map of the density of structural variants in the region, and can’t resolve haplotypes or define precise breakpoints. The team sequenced BAC clones with the PacBio RS II — not only to resolve a human alleles, but also to chart the evolution of the region and map bases under the most selective pressure by sequencing a range of non-human primates. Based on what they learned with the PacBio sequenced alleles, they developed molecular inversion probes (MIP) to enable rapid screening of CFHR genes in patient cohorts. Structural variation in the CFHR locus is linked to a number of diseases, including age-related macular degeneration (AMD) and lupus. Interestingly, the same variant can confer risk to one disease and protection from the other. With the MIP screening tool, they have collected variant information from a large patient cohort and hope to better understand how the revealed genotypes relate to patient phenotypes.
Hagen Tilgner from Stanford University explained how long read technology enables new insights into transcriptome studies that were not previously possible. Using PacBio long reads for sequencing a mixed human tissue sample, Tilgner and colleagues identified many novel isoforms encoding for proteins. Later, by sequencing a human trio sample, they were able to phase distant SNPs using PacBio reads that were not possible using short-read technology. Finally, using long reads, they analyzed human brain samples and found many significant exon pairings. Furthermore, the paired exons were mostly in coding regions. Tilgner emphasized that with long reads, a “phased [brain] proteome will now become possible,” potentially leading to novel biological discoveries.
Maria Nattestad from Cold Spring Harbor Laboratory described using a PacBio system to sequence the genome and the transcriptome of the SK-BR-3 breast cancer cell line. For the genome sequencing, the PacBio data produced a mean read length of 9 kb and a max read length of 71 kb with an average of 72X coverage. To understand the complex genomic rearrangements, Nattestad and her colleagues developed several tools to detect long range structural variations. Using SMRT Sequencing, they were able to identify the extremely complex and variable translocation occurring between the Her-2 oncogene locus on chromosome 17 and chromosome 8. Importantly, the translocation between chromosome 17 and 8 produced several fusion genes that were validated via PacBio Iso-Seq transcriptome sequencing. Using Iso-Seq analysis, they identified 17 fusion genes that were supported by both DNA and RNA evidence. Out of the 17 fusion genes, 13 were previously reported and four were novel. “The genome informs the transcriptome,” she said, where PacBio long reads help identify complex genome translocations and gene fusions.