Deletion of tumor-suppressor genes as well as other genomic rearrangements pervade cancer genomes across numerous types of solid tumor and hematologic malignancies. However, even for a specific rearrangement, the breakpoints may vary between individuals, such as the recurrent CDKN2A deletion. Characterizing the exact breakpoints for structural variants (SVs) is useful for designating patient-specific tumor biomarkers. We propose AmBre (Amplification of Breakpoints), a method to target SV breakpoints occurring in samples composed of heterogeneous tumor and germline DNA. Additionally, AmBre validates SVs called by whole-exome/genome sequencing and hybridization arrays. AmBre involves a PCR-based approach to amplify the DNA segment containing an SV’s breakpoint and then confirms breakpoints using sequencing by Pacific Biosciences RS. To amplify breakpoints with PCR, primers tiling specified target regions are carefully selected with a simulated annealing algorithm to minimize off-target amplification and maximize efficiency at capturing all possible breakpoints within the target regions. To confirm correct amplification and obtain breakpoints, PCR amplicons are combined without barcoding and simultaneously long-read sequenced using a single SMRT cell. Our algorithm efficiently separates reads based on breakpoints. Each read group supporting the same breakpoint corresponds with an amplicon and a consensus amplicon sequence is called. AmBre was used to discover CDKN2A deletion breakpoints in cancer cell lines: A549, CEM, Detroit562, MOLT4, MCF7, and T98G. Also, we successfully assayed RUNX1-RUNX1T1 reciprocal translocations by finding both breakpoints in the Kasumi-1 cell line. AmBre successfully targets SVs where DNA harboring the breakpoints are present in 1:1000 mixtures.
We describe the landscape of somatic genomic alterations of 66 chromophobe renal cell carcinomas (ChRCCs) on the basis of multidimensional and comprehensive characterization, including mtDNA and whole-genome sequencing. The result is consistent that ChRCC originates from the distal nephron compared with other kidney cancers with more proximal origins. Combined mtDNA and gene expression analysis implicates changes in mitochondrial function as a component of the disease biology, while suggesting alternative roles for mtDNA mutations in cancers relying on oxidative phosphorylation. Genomic rearrangements lead to recurrent structural breakpoints within TERT promoter region, which correlates with highly elevated TERT expression and manifestation of kataegis, representing a mechanism of TERT upregulation in cancer distinct from previously observed amplifications and point mutations. Copyright © 2014 Elsevier Inc. All rights reserved.
CGGBP1 is a repetitive DNA-binding transcription regulator with target sites at CpG-rich sequences such as CGG repeats and Alu-SINEs and L1-LINEs. The role of CGGBP1 as a possible mediator of CpG methylation however remains unknown. At CpG-rich sequences cytosine methylation is a major mechanism of transcriptional repression. Concordantly, gene-rich regions typically carry lower levels of CpG methylation than the repetitive elements. It is well known that at interspersed repeats Alu-SINEs and L1-LINEs high levels of CpG methylation constitute a transcriptional silencing and retrotransposon inactivating mechanism.Here, we have studied genome-wide CpG methylation with or without CGGBP1-depletion. By high throughput sequencing of bisulfite-treated genomic DNA we have identified CGGBP1 to be a negative regulator of CpG methylation at repetitive DNA sequences. In addition, we have studied CpG methylation alterations on Alu and L1 retrotransposons in CGGBP1-depleted cells using a novel bisulfite-treatment and high throughput sequencing approach.The results clearly show that CGGBP1 is a possible bidirectional regulator of CpG methylation at Alus, and acts as a repressor of methylation at L1 retrotransposons.
R331W Missense Mutation of Oncogene YAP1 Is a Germline Risk Allele for Lung Adenocarcinoma With Medical Actionability.
Adenocarcinoma is the most dominant type of lung cancer in never-smoker patients. The risk alleles from genome-wide association studies have small odds ratios and unclear biologic roles. Here we have taken an approach featuring suitable medical actionability to identify alleles with low population frequency but high disease-causing potential.Whole-genome sequencing was performed for a family with an unusually high density of lung adenocarcinoma with available DNA from the affected mother, four affected daughters, and one nonaffected son. Candidate risk alleles were confirmed by matrix-assisted laser desorption ionization time of flight mass spectroscopy. Validation was conducted in an external cohort of 1,135 participants without cancer and 1,312 patients with lung adenocarcinoma. Family follow-ups were performed by genotyping the relatives of the original proband and the relatives of the identified risk-allele carriers. Low-dose computed tomography scans of the chest were evaluated for lung abnormalities.YAP1 R331W missense mutation from the original family was identified and validated in the external controls and the cohort with lung adenocarcinoma. The YAP1 mutant-allele carrier frequency was 1.1% in patients with lung adenocarcinoma compared with 0.18% in controls (P = .0095), yielding an odds ratio (adjusted for age, sex, and smoking status) of 5.9. Among the relatives, YAP1-mutant carriers have overwhelmingly higher frequencies of developing lung adenocarcinoma or ground-glass opacity lung lesions than those who do not carry the mutation (10:0 v 1:7; P < .001). YAP1 mutation was shown to increase the colony formation ability and invasion potential of lung cancer cells.These results implicated YAP1 R331W as an allele predisposed for lung adenocarcinoma with high familial penetrance. Low-dose computed tomography scans may be recommended to this subpopulation, which is at high risk for lung cancer, for personalized prevention and health management. © 2015 by American Society of Clinical Oncology.
Characterizing and overriding the structural mechanism of the Quizartinib-resistant FLT3 “gatekeeper” F691L mutation with PLX3397.
Tyrosine kinase domain mutations are a common cause of acquired clinical resistance to tyrosine kinase inhibitors (TKI) used to treat cancer, including the FLT3 inhibitor quizartinib. Mutation of kinase “gatekeeper” residues, which control access to an allosteric pocket adjacent to the ATP-binding site, has been frequently implicated in TKI resistance. The molecular underpinnings of gatekeeper mutation-mediated resistance are incompletely understood. We report the first cocrystal structure of FLT3 with the TKI quizartinib, which demonstrates that quizartinib binding relies on essential edge-to-face aromatic interactions with the gatekeeper F691 residue, and F830 within the highly conserved Asp-Phe-Gly motif in the activation loop. This reliance makes quizartinib critically vulnerable to gatekeeper and activation loop substitutions while minimizing the impact of mutations elsewhere. Moreover, we identify PLX3397, a novel FLT3 inhibitor that retains activity against the F691L mutant due to a binding mode that depends less vitally on specific interactions with the gatekeeper position.We report the first cocrystal structure of FLT3 with a kinase inhibitor, elucidating the structural mechanism of resistance due to the gatekeeper F691L mutation. PLX3397 is a novel FLT3 inhibitor with in vitro activity against this mutation but is vulnerable to kinase domain mutations in the FLT3 activation loop. Cancer Discov; 5(6); 668-79. ©2015 AACR. This article is highlighted in the In This Issue feature, p. 565. ©2015 American Association for Cancer Research.
Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these ‘hotspot’ sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing.
Colorectal cancer (CRC) represents one of the most prevalent and lethal malignant neoplasms and every individual of age 50 and above should undergo regular CRC screening. Currently, the most effective preventive screening procedure to detect adenomatous polyps, the precursors to CRC, is colonoscopy. Since every colorectal cancer starts as a polyp, detecting all polyps and removing them is crucial. By exactly doing that, colonoscopy reduces CRC incidence by 80%, however it is an invasive procedure that might have unpleasant and, in rare occasions, dangerous side effects. Despite numerous efforts over the past two decades, a non-invasive screening method for the general population with detection rates for adenomas and CRC similar to that of colonoscopy has not yet been established. Recent advances in next generation sequencing technologies have yet to be successfully applied to this problem, because the detection of rare mutations has been hindered by the systematic biases due to sequencing context and the base calling quality of NGS. We present the first study that applies the high read accuracy and depth of single molecule, real time, circular consensus sequencing (SMRT-CCS) to the detection of mutations in stool DNA in order to provide a non-invasive, sensitive and accurate test for CRC. In stool DNA isolated from patients diagnosed with adenocarcinoma, we are able to detect mutations at frequencies below 0.5% with no false positives. This approach establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and the early stage adenomas that lead to CRC.
Single-molecule real-time (SMRT) sequencing generates much longer reads than other widely used next-generation (next-gen) sequencing methods, but its application to whole genome/exome analysis has been limited. Here, we describe the use of SMRT sequencing coupled with barcoding to simultaneously analyze one or a small number of genomic targets derived from multiple sources. In the budding yeast system, SMRT sequencing was used to analyze strand-exchange intermediates generated during mitotic recombination and to analyze genetic changes in a forward mutation assay. The general barcoding-SMRT approach was then extended to diffuse large B-cell lymphoma primary tumors and cell lines, where detected changes agreed with prior Illumina exome sequencing. A distinct advantage afforded by SMRT sequencing over other next-gen methods is that it immediately provides the linkage relationships between SNPs in the target segment sequenced. The strength of our approach for mutation/recombination studies (as well as linkage identification) derives from its inherent computational simplicity coupled with a lack of reliance on sophisticated statistical analyses. Copyright © 2015 Guo et al.
Detection and screening of chromosomal rearrangements in uterine leiomyomas by long-distance inverse PCR.
Genome instability is a hallmark of many tumors and recently, next-generation sequencing methods have enabled analyses of tumor genomes at an unprecedented level. Studying rearrangement-prone chromosomal regions (putative “breakpoint hotspots”) in detail, however, necessitates molecular assays that can detect de novo DNA fusions arising from these hotspots. Here we demonstrate the utility of a long-distance inverse PCR-based method for the detection and screening of de novo DNA rearrangements in uterine leiomyomas, one of the most common types of human neoplasm. This assay allows in principle any genomic region suspected of instability to be queried for DNA rearrangements originating there. No prior knowledge of the identity of the fusion partner chromosome is needed. We used this method to screen uterine leiomyomas for rearrangements at genomic locations known to be rearrangement-prone in this tumor type: upstream HMGA2 and within RAD51B. We identified a novel DNA rearrangement upstream of HMGA2 that had gone undetected in an earlier whole-genome sequencing study. In more than 30 additional uterine leiomyoma samples, not analyzed by whole-genome sequencing previously, no rearrangements were observed within the 1,107 bp and 1,996 bp assayed in the RAD51B and HMGA2 rearrangement hotspots. Our findings show that long-distance inverse PCR is a robust, sensitive, and cost-effective method for the detection and screening of DNA rearrangements from solid tumors that should be useful for many diagnostic applications. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Highly efficient CRISPR/Cas9-mediated cloning and functional characterization of gastric cancer-derived Epstein-Barr virus strains.
The Epstein-Barr virus (EBV) is etiologically linked to approximately 10% of gastric cancers, in which viral genomes are maintained as multicopy episomes. EBV-positive gastric cancer cells are incompetent for progeny virus production, making viral DNA cloning extremely difficult. Here we describe a highly efficient strategy for obtaining bacterial artificial chromosome (BAC) clones of EBV episomes by utilizing a CRISPR/Cas9-mediated strand break of the viral genome and subsequent homology-directed repair. EBV strains maintained in two gastric cancer cell lines (SNU719 and YCCEL1) were cloned, and their complete viral genome sequences were determined. Infectious viruses of gastric cancer cell-derived EBVs were reconstituted, and the viruses established stable latent infections in immortalized keratinocytes. While Ras oncoprotein overexpression caused massive vacuolar degeneration and cell death in control keratinocytes, EBV-infected keratinocytes survived in the presence of Ras expression. These results implicate EBV infection in predisposing epithelial cells to malignant transformation by inducing resistance to oncogene-induced cell death.Recent progress in DNA-sequencing technology has accelerated EBV whole-genome sequencing, and the repertoire of sequenced EBV genomes is increasing progressively. Accordingly, the presence of EBV variant strains that may be relevant to EBV-associated diseases has begun to attract interest. Clearly, the determination of additional disease-associated viral genome sequences will facilitate the identification of any disease-specific EBV variants. We found that CRISPR/Cas9-mediated cleavage of EBV episomal DNA enabled the cloning of disease-associated viral strains with unprecedented efficiency. As a proof of concept, two gastric cancer cell-derived EBV strains were cloned, and the infection of epithelial cells with reconstituted viruses provided important clues about the mechanism of EBV-mediated epithelial carcinogenesis. This experimental system should contribute to establishing the relationship between viral genome variation and EBV-associated diseases. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
There is great potential for genome sequencing to enhance patient care through improved diagnostic sensitivity and more precise therapeutic targeting. To maximize this potential, genomics strategies that have been developed for genetic discovery – including DNA-sequencing technologies and analysis algorithms – need to be adapted to fit clinical needs. This will require the optimization of alignment algorithms, attention to quality-coverage metrics, tailored solutions for paralogous or low-complexity areas of the genome, and the adoption of consensus standards for variant calling and interpretation. Global sharing of this more accurate genotypic and phenotypic data will accelerate the determination of causality for novel genes or variants. Thus, a deeper understanding of disease will be realized that will allow its targeting with much greater therapeutic precision.
Complete genome sequence of Lactobacillus salivarius Ren, a probiotic strain with anti-tumor activity.
Lactobacillus salivarius Ren (LsR) (CGMCC No. 3606) is a probiotic strain that was isolated from the feces of a healthy centenarian living in Bama, Guangxi, China. Previous studies have shown that this strain decreases 4-nitroquinoline 1-oxide (4-NQO)-induced genotoxicity in vitro. It also suppresses 4-NQO-induced oral carcinogenesis and 1,2-dimethylhydrazine (DMH)-induced colorectal carcinogenesis, and therefore may be used as an adjuvant therapeutic agent for cancer. Here, we report the complete genome sequence of LsR that consists of a circular chromosome of 1751,565bp and two plasmids (pR1, 176,951bp; pR2, 49,848bp). Copyright © 2015 Elsevier B.V. All rights reserved.
Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.
Bacterial symbionts of fungus-growing ants occupy a highly specialized ecological niche and face the constant existential threat of displacement by another strain of ant-adapted bacteria. As part of a systematic study of the small molecules underlying this fraternal competition, we discovered an analog of the antitumor agent rebeccamycin, a member of the increasingly important indolocarbazole family. While several gene clusters consistent with this molecule’s newly reported modification had previously been identified in metagenomic studies, the metabolite itself has been cryptic. The biosynthetic gene cluster for 9-methoxyrebeccamycin is encoded on a plasmid in a manner reminiscent of plasmid-derived peptide antimicrobials that commonly mediate antagonism among closely related Gram-negative bacteria.
Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent ‘third-generation’ sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.