Jim Lupski is a professor at Baylor College of Medicine where he’s on the frontline of incorporating genomic research into everyday clinical practice. The story begins with Jim’s own genome,…
At AGBT 2017, Mike Schatz from Johns Hopkins University and Cold Spring Harbor Laboratory presented data from sequencing, assembling, and analyzing personalized, phased diploid genomes with either Illumina, 10x Genomics,…
The CF Canada-Sick Kids Program in individual CF therapy: A resource for the advancement of personalized medicine in CF.
Therapies targeting certain CFTR mutants have been approved, yet variations in clinical response highlight the need for in-vitro and genetic tools that predict patient-specific clinical outcomes. Toward this goal, the CF Canada-Sick Kids Program in Individual CF Therapy (CFIT) is generating a “first of its kind”, comprehensive resource containing patient-specific cell cultures and data from 100 CF individuals that will enable modeling of therapeutic responses.The CFIT program is generating: 1) nasal cells from drug naïve patients suitable for culture and the study of drug responses in vitro, 2) matched gene expression data obtained by sequencing the RNA from the primary nasal tissue, 3) whole genome sequencing of blood derived DNA from each of the 100 participants, 4) induced pluripotent stem cells (iPSCs) generated from each participant’s blood sample, 5) CRISPR-edited isogenic control iPSC lines and 6) prospective clinical data from patients treated with CF modulators.To date, we have recruited 57 of 100 individuals to CFIT, most of whom are homozygous for F508del (to assess in-vitro: in-vivo correlations with respect to ORKAMBI response) or heterozygous for F508del and a minimal function mutation. In addition, several donors are homozygous for rare nonsense and missense mutations. Nasal epithelial cell cultures and matched iPSC lines are available for many of these donors.This accessible resource will enable development of tools that predict individual outcomes to current and emerging modulators targeting F508del-CFTR and facilitate therapy discovery for rare CF causing mutations.Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Genome analyses for the Tohoku Medical Megabank Project towards establishment of personalized healthcare.
Personalized healthcare (PHC) based on an individual’s genetic make-up is one of the most advanced, yet feasible, forms of medical care. The Tohoku Medical Megabank (TMM) Project aims to combine population genomics, medical genetics and prospective cohort studies to develop a critical infrastructure for the establishment of PHC. To date, a TMM CommCohort (adult general population) and a TMM BirThree Cohort (birth+three-generation families) have conducted recruitments and baseline surveys. Genome analyses as part of the TMM Project will aid in the development of a high-fidelity whole-genome Japanese reference panel, in designing custom single-nucleotide polymorphism (SNP) arrays specific to Japanese, and in estimation of the biological significance of genetic variations through linked investigations of the cohorts. Whole-genome sequencing from >3,500 unrelated Japanese and establishment of a Japanese reference genome sequence from long-read data have been done. We next aim to obtain genotype data for all TMM cohort participants (>150,000) using our custom SNP arrays. These data will help identify disease-associated genomic signatures in the Japanese population, while genomic data from TMM BirThree Cohort participants will be used to improve the reference genome panel. Follow-up of the cohort participants will allow us to test the genetic markers and, consequently, contribute to the realization of PHC.
Personal transcriptomes in which all of an individual’s genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes =3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV–in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.
Long expansions of short tandem repeats (STRs), i.e. DNA repeats of 2-6 nt, are associated with some genetic diseases. Cost-efficient high-throughput sequencing can quickly produce billions of short reads that would be useful for uncovering disease-associated STRs. However, enumerating STRs in short reads remains largely unexplored because of the difficulty in elucidating STRs much longer than 100 bp, the typical length of short reads.We propose ab initio procedures for sensing and locating long STRs promptly by using the frequency distribution of all STRs and paired-end read information. We validated the reproducibility of this method using biological replicates and used it to locate an STR associated with a brain disease (SCA31). Subsequently, we sequenced this STR site in 11 SCA31 samples using SMRT(TM) sequencing (Pacific Biosciences), determined 2.3-3.1 kb sequences at nucleotide resolution and revealed that (TGGAA)- and (TAAAATAGAA)-repeat expansions determined the instability of the repeat expansions associated with SCA31. Our method could also identify common STRs, (AAAG)- and (AAAAG)-repeat expansions, which are remarkably expanded at four positions in an SCA31 sample. This is the first proposed method for rapidly finding disease-associated long STRs in personal genomes using hybrid sequencing of short and long reads.Our TRhist software is available at http://email@example.comSupplementary data are available at Bioinformatics online.
Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.
Development and clinical application of an integrative genomic approach to personalized cancer therapy.
Personalized therapy provides the best outcome of cancer care and its implementation in the clinic has been greatly facilitated by recent convergence of enormous progress in basic cancer research, rapid advancement of new tumor profiling technologies, and an expanding compendium of targeted cancer therapeutics.We developed a personalized cancer therapy (PCT) program in a clinical setting, using an integrative genomics approach to fully characterize the complexity of each tumor. We carried out whole exome sequencing (WES) and single-nucleotide polymorphism (SNP) microarray genotyping on DNA from tumor and patient-matched normal specimens, as well as RNA sequencing (RNA-Seq) on available frozen specimens, to identify somatic (tumor-specific) mutations, copy number alterations (CNAs), gene expression changes, gene fusions, and also germline variants. To provide high sensitivity in known cancer mutation hotspots, Ion AmpliSeq Cancer Hotspot Panel v2 (CHPv2) was also employed. We integrated the resulting data with cancer knowledge bases and developed a specific workflow for each cancer type to improve interpretation of genomic data.We returned genomics findings to 46 patients and their physicians describing somatic alterations and predicting drug response, toxicity, and prognosis. Mean 17.3 cancer-relevant somatic mutations per patient were identified, 13.3-fold, 6.9-fold, and 4.7-fold more than could have been detected using CHPv2, Oncomine Cancer Panel (OCP), and FoundationOne, respectively. Our approach delineated the underlying genetic drivers at the pathway level and provided meaningful predictions of therapeutic efficacy and toxicity. Actionable alterations were found in 91 % of patients (mean 4.9 per patient, including somatic mutations, copy number alterations, gene expression alterations, and germline variants), a 7.5-fold, 2.0-fold, and 1.9-fold increase over what could have been uncovered by CHPv2, OCP, and FoundationOne, respectively. The findings altered the course of treatment in four cases. These results show that a comprehensive, integrative genomic approach as outlined above significantly enhanced genomics-based PCT strategies.
Feasibility of real time next generation sequencing of cancer genes linked to drug response: results from a clinical trial.
The successes of targeted drugs with companion predictive biomarkers and the technological advances in gene sequencing have generated enthusiasm for evaluating personalized cancer medicine strategies using genomic profiling. We assessed the feasibility of incorporating real-time analysis of somatic mutations within exons of 19 genes into patient management. Blood, tumor biopsy and archived tumor samples were collected from 50 patients recruited from four cancer centers. Samples were analyzed using three technologies: targeted exon sequencing using Pacific Biosciences PacBio RS, multiplex somatic mutation genotyping using Sequenom MassARRAY and Sanger sequencing. An expert panel reviewed results prior to reporting to clinicians. A clinical laboratory verified actionable mutations. Fifty patients were recruited. Nineteen actionable mutations were identified in 16 (32%) patients. Across technologies, results were in agreement in 100% of biopsy specimens and 95% of archival specimens. Profiling results from paired archival/biopsy specimens were concordant in 30/34 (88%) patients. We demonstrated that the use of next generation sequencing for real-time genomic profiling in advanced cancer patients is feasible. Additionally, actionable mutations identified in this study were relatively stable between archival and biopsy samples, implying that cancer mutations that are good predictors of drug response may remain constant across clinical stages. Copyright © 2012 UICC.
In recent years, the increasing awareness that somatic mutations and other genetic aberrations drive human malignancies has led us within reach of personalized cancer medicine (PCM). The implementation of PCM is based on the following premises: genetic aberrations exist in human malignancies; a subset of these aberrations drive oncogenesis and tumor biology; these aberrations are actionable (defined as having the potential to affect management recommendations based on diagnostic, prognostic, and/or predictive implications); and there are highly specific anticancer agents available that effectively modulate these targets. This article highlights the technology underlying cancer genomics and examines the early results of genome sequencing and the challenges met in the discovery of new genetic aberrations. Finally, drawing from experiences gained in a feasibility study of somatic mutation genotyping and targeted exome sequencing led by Princess Margaret Hospital-University Health Network and the Ontario Institute for Cancer Research, the processes, challenges, and issues involved in the translation of cancer genomics to the clinic are discussed.