The expression of androgen receptor (AR) variants is a frequent, yet poorly-understood mechanism of clinical resistance to AR-targeted therapy for castration-resistant prostate cancer (CRPC). Among the multiple AR variants expressed in CRPC, AR-V7 is considered the most clinically-relevant AR variant due to broad expression in CRPC, correlations of AR-V7 expression with clinical resistance, and growth inhibition when AR-V7 is knocked down in CRPC models. Therefore, efforts are under way to develop strategies for monitoring and inhibiting AR-V7 in castration-resistant prostate cancer (CRPC). The aim of this study was to understand whether other AR variants are co-expressed with AR-V7 and promote resistance to AR-targeted therapies. To test this, we utilized RNA-seq to characterize AR expression in CRPC models. RNA-seq revealed the frequent coexpression of AR-V9 and AR-V7 in multiple CRPC models and metastases. Furthermore, long-read single-molecule real-time (SMRT) sequencing of AR isoforms revealed that AR-V7 and AR-V9 shared a common 3’terminal cryptic exon. To test this, we knocked down AR-V7 in prostate cancer cell lines and confirmed that AR-V9 mRNA and protein expression were also impacted. In reporter assays with AR-responsive promoters, AR-V9 functioned as a constitutive activator of androgen/AR signaling. Similarly, infection of AR-V9 lentiviral construct in LNCaP cells induced androgen-independent cell proliferation. In conclusion, these data implicate co-expression of AR-V9 with AR-V7 as an important component of constitutive AR signaling and therapeutic resistance in CRPC.
IsoSeq analysis and functional annotation of the infratentorial ependymoma tumor tissue on PacBio RSII platform.
Here, we sequenced and functionally annotated the long reads (1-2 kb) cDNAs library of an infratentorial ependymoma tumor tissue on PacBio RSII by Iso-Seq protocol using SMRT technology. 577 MB, data was generated from the brain tissues of ependymoma tumor patient, producing 1,19,313 high-quality reads assembled into 19,878 contigs using Celera assembler followed by Quiver pipelines, which produced 2952 unique protein accessions in the nr protein database and 307 KEGG pathways. Additionally, when we compared GO terms of second and third level with alternative splicing data obtained through HTA Array2.0. We identified four and twelve transcript cluster IDs in Level-2 and Level-3 scores respectively with alternative splicing index predicting mainly the major pathways of hallmarks of cancer. Out of these transcript cluster IDs only transcript cluster IDs of gene PNMT, SNN and LAMB1 showed Reads Per Kilobase of exon model per Million mapped reads (RPKM) values at gene-level expression (GE) and transcript-level (TE) track. Most importantly, brain-specific genes–PNMT, SNN and LAMB1 show their involvement in Ependymoma.
Androgen receptor variant AR-V9 is co-expressed with AR-V7 in prostate cancer metastases and predicts abiraterone resistance.
Purpose: Androgen receptor (AR) variant AR-V7 is a ligand-independent transcription factor that promotes prostate cancer resistance to AR-targeted therapies. Accordingly, efforts are underway to develop strategies for monitoring and inhibiting AR-V7 in castration-resistant prostate cancer (CRPC). The purpose of this study was to understand whether other AR variants may be co-expressed with AR-V7 and promote resistance to AR-targeted therapies. Experimental Design: We utilized complementary short- and long-read sequencing of intact AR mRNA isoforms to characterize AR expression in CRPC models. Co-expression of AR-V7 and AR-V9 mRNA in CRPC metastases and circulating tumor cells was assessed by RNA-seq and RT-PCR, respectively. Expression of AR-V9 protein in CRPC models was evaluated with polyclonal antisera. Multivariate analysis was performed to test whether AR variant mRNA expression in metastatic tissues was associated with a 12-week progression-free survival endpoint in a prospective clinical trial of 78 CRPC-stage patients initiating therapy with the androgen synthesis inhibitor, abiraterone acetate. Results: AR-V9 was frequently co-expressed with AR-V7. Both AR variant species were found to share a common 3′ terminal cryptic exon, which rendered AR-V9 susceptible to experimental manipulations that were previously-thought to target AR-V7 uniquely. AR-V9 promoted ligand-independent growth of prostate cancer cells. High AR-V9 mRNA expression in CRPC metastases was predictive of primary resistance to abiraterone acetate (HR = 4.0, 95% CI = 1.31-12.2, P = 0.02). Conclusions: AR-V9 may be an important component of therapeutic resistance in CRPC. Copyright ©2017, American Association for Cancer Research.
Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimised for short reads. JAFFA (https://github.com/Oshlack/JAFFA/wiki) is a sensitive fusion detection method that outperforms other methods with reads of 100 bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.
MCF-7 breast cancer cell line PacBio generated transcriptome has ~300 novel transcribed regions, un-annotated in both RefSeq and GENCODE, and absent in the liver, heart and brain transcriptomes
Illuminating the “dark” regions of the human genome remains an ongoing effort, a decade and a half after the human genome was sequenced – RefSeq and GENCODE being two of the major annotation databases. Pacific Biosciences (PacBio) has provided open access to the transcriptome of MCF-7, a breast cancer cell line that has provided significant therapeutic advancement in breast cancer research since the 1970s. PacBio sequencing generates much longer reads compared to second-generation sequencing technologies, with a trade-off of lower throughput, higher error rate and more cost per base. Here, this transcriptome was analyzed using the YeATS pipeline, with additionally introduced kmer based algorithms, reducing computational times to a few hours on a simple workstation. Out of ~300 transcripts that have no match in both RefSeq and GENCODE, ~250 are absent in the transcriptomes of the heart, liver and brain, also provided by PacBio. Also, ~200 transcripts are absent in a recent catalogue of un-annotated long non-coding RNAs from 6,503 samples (~43 Terabases of sequence data) , and only two present in common in an experimental workflow RACE-Seq that reported 2,556 novel transcripts . ~100 transcripts have >100 amino acid open reading frames, and have the potential of being protein coding genes. ORF based annotation also identified few bacterial transcripts in the PacBio database mapped to the human genome, and one human transcript that has been annotated as bacterial in the NCBI database. The current work reiterates the under-utilization of transcriptomes for annotating genomes. It also provides new leads for investigating breast cancer by virtue of exclusively expressed transcripts not expressed in other tissues, which have the prospects of breast cancer biomarkers based on further investigations.
Alternative splice variants of AID are not stoichiometrically present at the protein level in chronic lymphocytic leukemia
Activation-induced deaminase (AID) is a DNA-mutating enzyme that mediates class-switch recombination as well as somatic hypermutation of antibody genes in B cells. Due to off-target activity, AID is implicated in lymphoma development by introducing genome-wide DNA damage and initiating chromosomal translocations such as c-myc/IgH. Several alternative splice transcripts of AID have been reported in activated B cells as well as malignant B cells such as chronic lymphocytic leukemia (CLL). As most commercially available antibodies fail to recognize alternative splice variants, their abundance in vivo, and hence their biological significance, has not been determined. In this study, we assessed the protein levels of AID splice isoforms by introducing an AID splice reporter construct into cell lines and primary CLL cells from patients as well as from WT and TCL1(tg) C57BL/6 mice (where TCL1 is T-cell leukemia/lymphoma 1). The splice construct is 5′-fused to a GFP-tag, which is preserved in all splice isoforms and allows detection of translated protein. Summarizing, we show a thorough quantification of alternatively spliced AID transcripts and demonstrate that the corresponding protein abundances, especially those of splice variants AID-ivs3 and AID-?E4, are not stoichiometrically equivalent. Our data suggest that enhanced proteasomal degradation of low-abundance proteins might be causative for this discrepancy. © 2013 The Authors. European Journal of Immunology published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Identification by high-throughput imaging of the histone methyltransferase EHMT2 as an epigenetic regulator of VEGFA alternative splicing.
Recent evidence points to a role of chromatin in regulation of alternative pre-mRNA splicing (AS). In order to identify novel chromatin regulators of AS, we screened an RNAi library of chromatin proteins using a cell-based high-throughput in vivo assay. We identified a set of chromatin proteins that regulate AS. Using simultaneous genome-wide expression and AS analysis, we demonstrate distinct and non-overlapping functions of these chromatin modifiers on transcription and AS. Detailed mechanistic characterization of one dual function chromatin modifier, the H3K9 methyltransferase EHMT2 (G9a), identified VEGFA as a major chromatin-mediated AS target. Silencing of EHMT2, or its heterodimer partner EHMT1, affects AS by promoting exclusion of VEGFA exon 6a, but does not alter total VEGFA mRNA levels. The epigenetic regulatory mechanism of AS by EHMT2 involves an adaptor system consisting of the chromatin modulator HP1?, which binds methylated H3K9 and recruits splicing regulator SRSF1. The epigenetic regulation of VEGFA is physiologically relevant since EHMT2 is transcriptionally induced in response to hypoxia and triggers concomitant changes in AS of VEGFA. These results characterize a novel epigenetic regulatory mechanism of AS and they demonstrate separate roles of epigenetic modifiers in transcription and alternative splicing. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
Fusion of TTYH1 with the C19MC microRNA cluster drives expression of a brain-specific DNMT3B isoform in the embryonal brain tumor ETMR.
Embryonal tumors with multilayered rosettes (ETMRs) are rare, deadly pediatric brain tumors characterized by high-level amplification of the microRNA cluster C19MC. We performed integrated genetic and epigenetic analyses of 12 ETMR samples and identified, in all cases, C19MC fusions to TTYH1 driving expression of the microRNAs. ETMR tumors, cell lines and xenografts showed a specific DNA methylation pattern distinct from those of other tumors and normal tissues. We detected extreme overexpression of a previously uncharacterized isoform of DNMT3B originating at an alternative promoter that is active only in the first weeks of neural tube development. Transcriptional and immunohistochemical analyses suggest that C19MC-dependent DNMT3B deregulation is mediated by RBL2, a known repressor of DNMT3B. Transfection with individual C19MC microRNAs resulted in DNMT3B upregulation and RBL2 downregulation in cultured cells. Our data suggest a potential oncogenic re-engagement of an early developmental program in ETMR via epigenetic alteration mediated by an embryonic, brain-specific DNMT3B isoform.
Clonal distribution of BCR-ABL1 mutations and splice isoforms by single-molecule long-read RNA sequencing.
The evolution of mutations in the BCR-ABL1 fusion gene transcript renders CML patients resistant to tyrosine kinase inhibitor (TKI) based therapy. Thus screening for BCR-ABL1 mutations is recommended particularly in patients experiencing poor response to treatment. Herein we describe a novel approach for the detection and surveillance of BCR-ABL1 mutations in CML patients.To detect mutations in the BCR-ABL1 transcript we developed an assay based on the Pacific Biosciences (PacBio) sequencing technology, which allows for single-molecule long-read sequencing of BCR-ABL1 fusion transcript molecules. Samples from six patients with poor response to therapy were analyzed both at diagnosis and follow-up. cDNA was generated from total RNA and a 1,6 kb fragment encompassing the BCR-ABL1 transcript was amplified using long range PCR. To estimate the sensitivity of the assay, a serial dilution experiment was performed.Over 10,000 full-length BCR-ABL1 sequences were obtained for all samples studied. Through the serial dilution analysis, mutations in CML patient samples could be detected down to a level of at least 1%. Notably, the assay was determined to be sufficiently sensitive even in patients harboring a low abundance of BCR-ABL1 levels. The PacBio sequencing successfully identified all mutations seen by standard methods. Importantly, we identified several mutations that escaped detection by the clinical routine analysis. Resistance mutations were found in all but one of the patients. Due to the long reads afforded by PacBio sequencing, compound mutations present in the same molecule were readily distinguished from independent alterations arising in different molecules. Moreover, several transcript isoforms of the BCR-ABL1 transcript were identified in two of the CML patients. Finally, our assay allowed for a quick turn around time allowing samples to be reported upon within 2 days.In summary the PacBio sequencing assay can be applied to detect BCR-ABL1 resistance mutations in both diagnostic and follow-up CML patient samples using a simple protocol applicable to routine diagnosis. The method besides its sensitivity, gives a complete view of the clonal distribution of mutations, which is of importance when making therapy decisions.
Mixed fibrolamellar hepatocellular carcinoma (mFL-HCC) is a rare liver tumor defined by the presence of both pure FL-HCC and conventional HCC components, represents up to 25% of cases of FL-HCC, and has been associated with worse prognosis. Recent genomic characterization of pure FL-HCC identified a highly recurrent transcript fusion (DNAJB1:PRKACA) not found in conventional HCC.We performed exome and transcriptome sequencing of a case of mFL-HCC. A novel BAC-capture approach was developed to identify a 400 kb deletion as the underlying genomic mechanism for a DNAJB1:PRKACA fusion in this case. A sensitive Nanostring Elements assay was used to screen for this transcript fusion in a second case of mFL-HCC, 112 additional HCC samples and 44 adjacent non-tumor liver samples.We report the first comprehensive genomic analysis of a case of mFL-HCC. No common HCC-associated mutations were identified. The very low mutation rate of this case, large number of mostly single-copy, long-range copy number variants, and high expression of ERBB2 were more consistent with previous reports of pure FL-HCC than conventional HCC. In particular, the DNAJB1:PRKACA fusion transcript specifically associated with pure FL-HCC was detected at very high expression levels. Subsequent analysis revealed the presence of this fusion in all primary and metastatic samples, including those with mixed or conventional HCC pathology. A second case of mFL-HCC confirmed our finding that the fusion was detectable in conventional components. An expanded screen identified a third case of fusion-positive HCC, which upon review, also had both conventional and fibrolamellar features. This screen confirmed the absence of the fusion in all conventional HCC and adjacent non-tumor liver samples.These results indicate that mFL-HCC is similar to pure FL-HCC at the genomic level and the DNAJB1:PRKACA fusion can be used as a diagnostic tool for both pure and mFL-HCC.© The Author 2016. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.
We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Identification of a novel fusion transcript between human relaxin-1 (RLN1) and human relaxin-2 (RLN2) in prostate cancer.
Simultaneous expression of highly homologous RLN1 and RLN2 genes in prostate impairs their accurate delineation. We used PacBio SMRT sequencing and RNA-Seq in LNCaP cells in order to dissect the expression of RLN1 and RLN2 variants. We identified a novel fusion transcript comprising the RLN1 and RLN2 genes and found evidence of its expression in the normal and prostate cancer tissues. The RLN1-RLN2 fusion putatively encodes RLN2 isoform with the deleted secretory signal peptide. The identification of the fusion transcript provided information to determine unique RLN1-RLN2 fusion and RLN1 regions. The RLN1-RLN2 fusion was co-expressed with RLN1 in LNCaP cells, but the two gene products were inversely regulated by androgens. We showed that RLN1 is underrepresented in common PCa cell lines in comparison to normal and PCa tissue. The current study brings a highly relevant update to the relaxin field, and will encourage further studies of RLN1 and RLN2 in PCa and broader. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Gene profiling of diffuse large B cell lymphoma (DLBCL) has revealed broad gene expression deregulation compared to normal B cells. While many studies have interrogated well known and annotated genes in DLBCL, none have yet performed a systematic analysis to uncover novel unannotated long non-coding RNAs (lncRNA) in DLBCL. In this study we sought to uncover these lncRNAs by examining RNA-seq data from primary DLBCL tumors and performed supporting analysis to identify potential role of these lncRNAs in DLBCL.We performed a systematic analysis of novel lncRNAs from the poly-adenylated transcriptome of 116 primary DLBCL samples. RNA-seq data were processed using de novo transcript assembly pipeline to discover novel lncRNAs in DLBCL. Systematic functional, mutational, cross-species, and co-expression analyses using numerous bioinformatics tools and statistical analysis were performed to characterize these novel lncRNAs.We identified 2,632 novel, multi-exonic lncRNAs expressed in more than one tumor, two-thirds of which are not expressed in normal B cells. Long read single molecule sequencing supports the splicing structure of many of these lncRNAs. More than one-third of novel lncRNAs are differentially expressed between the two major DLBCL subtypes, ABC and GCB. Novel lncRNAs are enriched at DLBCL super-enhancers, with a fraction of them conserved between human and dog lymphomas. We see transposable elements (TE) overlap in the exonic regions; particularly significant in the last exon of the novel lncRNAs suggest potential usage of cryptic TE polyadenylation signals. We identified highly co-expressed protein coding genes for at least 88 % of the novel lncRNAs. Functional enrichment analysis of co-expressed genes predicts a potential function for about half of novel lncRNAs. Finally, systematic structural analysis of candidate point mutations (SNVs) suggests that such mutations frequently stabilize lncRNA structures instead of destabilizing them.Discovery of these 2,632 novel lncRNAs in DLBCL significantly expands the lymphoma transcriptome and our analysis identifies potential roles of these lncRNAs in lymphomagenesis and/or tumor maintenance. For further studies, these novel lncRNAs also provide an abundant source of new targets for antisense oligonucleotide pharmacology, including shared targets between human and dog lymphomas.
Effective targeted cancer therapeutic development depends upon distinguishing disease-associated ‘driver’ mutations, which have causative roles in malignancy pathogenesis, from ‘passenger’ mutations, which are dispensable for cancer initiation and maintenance. Translational studies of clinically active targeted therapeutics can definitively discriminate driver from passenger lesions and provide valuable insights into human cancer biology. Activating internal tandem duplication (ITD) mutations in FLT3 (FLT3-ITD) are detected in approximately 20% of acute myeloid leukaemia (AML) patients and are associated with a poor prognosis. Abundant scientific and clinical evidence, including the lack of convincing clinical activity of early FLT3 inhibitors, suggests that FLT3-ITD probably represents a passenger lesion. Here we report point mutations at three residues within the kinase domain of FLT3-ITD that confer substantial in vitro resistance to AC220 (quizartinib), an active investigational inhibitor of FLT3, KIT, PDGFRA, PDGFRB and RET; evolution of AC220-resistant substitutions at two of these amino acid positions was observed in eight of eight FLT3-ITD-positive AML patients with acquired resistance to AC220. Our findings demonstrate that FLT3-ITD can represent a driver lesion and valid therapeutic target in human AML. AC220-resistant FLT3 kinase domain mutants represent high-value targets for future FLT3 inhibitor development efforts.