Structural variant detection with long read sequencing reveals driver and passenger mutations in a melanoma cell line
Past large scale cancer genome sequencing efforts, including The Cancer Genome Atlas and the International Cancer Genome Consortium, have utilized short-read sequencing, which is well-suited for detecting single nucleotide variants (SNVs) but far less reliable for detecting variants larger than 20 base pairs, including insertions, deletions, duplications, inversions and translocations. Recent same-sample comparisons of short- and long-read human reference genome data have revealed that short-read resequencing typically uncovers only ~4,000 structural variants (SVs, =50 bp) per genome and is biased towards deletions, whereas sequencing with PacBio long-reads consistently finds ~20,000 SVs, evenly balanced between insertions and deletions. This discovery has important implications for cancer research, as it is clear that SVs are both common and biologically important in many cancer subtypes, including colorectal, breast and ovarian cancer. Without confident and comprehensive detection of structural variants, it is unlikely we have a sufficiently complete picture of all the genomic changes that impact cancer development, disease progression, treatment response, drug resistance, and relapse. To begin to address this unmet need, we have sequenced the COLO829 tumor and matched normal lymphoblastoid cell lines to 49- and 51-fold coverage, respectively, with PacBio SMRT Sequencing, with the goal of developing a high-confidence structural variant call set that can be used to empirically evaluate cost-effective experimental designs for larger scale studies and develop structural variation calling software suitable for cancer genomics. Structural variant calling revealed over 21,000 deletions and 19,500 insertions larger than 20 bp, nearly four times the number of events detected with short-read sequencing. The vast majority of events are shared between the tumor and normal, with about 100 putative somatic deletions and 400 insertions, primarily in microsatellites. A further 40 rearrangements were detected, nearly exclusively in the tumor. One rearrangement is shared between the tumor and normal, t(5;X) which disrupts the mismatch repeat gene MSH3, and is likely a driver mutation. Generating high-confidence call sets that cover the entire size-spectrum of somatic variants from a range of cancer model systems is the first step in determining what will be the best approach for addressing an ongoing blind spot in our current understanding of cancer genomes. Here the application of PacBio sequencing to a melanoma cancer cell line revealed thousands of previously overlooked variants, including a mutation likely involved in tumorogenesis.