Early detection of colorectal cancer (CRC) and its precursor lesions (adenomas) is crucial to reduce mortality rates. The fecal immunochemical test (FIT) is a non-invasive CRC screening test that detects the blood-derived protein hemoglobin. However, FIT sensitivity is suboptimal especially in detection of CRC precursor lesions. As adenoma-to-carcinoma progression is accompanied by alternative splicing, tumor-specific proteins derived from alternatively spliced RNA transcripts might serve as candidate biomarkers for CRC detection.
ASHG PacBio Workshop: SMRT Sequencing as a translational research tool to investigate germline, somatic and infectious diseases
Melissa Laird Smith discussed how the Icahn School of Medicine at Mount Sinai uses long-read sequencing for translational research. She gave several examples of targeted sequencing projects run on the…
AGBT Virtual Poster: Using the PacBio Iso-Seq method to search for novel colorectal cancer biomarkers
Early detection of colorectal cancer (CRC) and its precursor lesions (adenomas) is crucial to reduce mortality rates. The fecal immunochemical test (FIT) is a non-invasive CRC screening test that detects…
The recent advent of long-read sequencing technologies is expected to provide reasonable answers to genetic challenges unresolvable by short-read sequencing, primarily the inability to accurately study structural variations, copy number variations, and homologous repeats in complex parts of the genome. However, long-read sequencing comes along with higher rates of random short deletions and insertions, and single nucleotide errors. The relatively higher sequencing accuracy of short-read sequencing has kept it as the first choice of screening for single nucleotide variants and short deletions and insertions. Albeit, short-read sequencing still suffers from systematic errors that tend to occur at specific positions where a high depth of reads is not always capable to correct for these errors. In this study, we compared the genotyping of mitochondrial DNA variants in three samples using PacBio’s Sequel (Pacific Biosciences Inc., Menlo Park, CA, USA) long-read sequencing and illumina’s HiSeqX10 (illumine Inc., San Diego, CA, USA) short-read sequencing data. We concluded that, despite the differences in the type and frequency of errors in the long-reads sequencing, its accuracy is still comparable to that of short-reads for genotyping short nuclear variants; due to the randomness of errors in long reads, a lower coverage, around 37 reads, can be sufficient to correct for these random errors.
Long-Read RNA Sequencing Identifies Alternative Splice Variants in Hepatocellular Carcinoma and Tumor-Specific Isoforms.
Alternative splicing (AS) allows generation of cell type-specific mRNA transcripts and contributes to hallmarks of cancer. Genome-wide analysis for AS in human hepatocellular carcinoma (HCC), however, is limited. We sought to obtain a comprehensive AS landscape in HCC and define tumor-associated variants. Single-molecule real-time long-read RNA sequencing was performed on patient-derived HCC cells, and presence of splice junctions was defined by SpliceMap-LSC-IDP algorithm. We obtained an all-inclusive map of annotated AS variants and further discovered 362 alternative spliced variants that are not previously reported in any database (neither RefSeq nor GENCODE). They were mostly derived from intron retention and early termination codon with an in-frame open reading frame in 81.5%. We corroborated many of these predicted unannotated and annotated variants to be tumor specific in an independent cohort of primary HCC tumors and matching nontumoral liver. Using the combined Sanger sequencing and TaqMan junction assays, unique and common expressions of spliced variants including enzyme regulators (ARHGEF2, SERPINH1), chromatin modifiers (DEK, CDK9, RBBP7), RNA-binding proteins (SRSF3, RBM27, MATR3, YBX1), and receptors (ADRM1, CD44v8-10, vitamin D receptor, ROR1) were determined in HCC tumors. We further focused functional investigations on ARHGEF2 variants (v1 and v3) that arise from the common amplified site chr.1q22 of HCC. Their biological significance underscores two major cancer hallmarks, namely cancer stemness and epithelial-to-mesenchymal transition-mediated cell invasion and migration, although v3 is consistently more potent than v1. Conclusion: Alternative isoforms and tumor-specific isoforms that arise from aberrant splicing are common during the liver tumorigenesis. Our results highlight insights gained from the analysis of AS in HCC. © 2019 The Authors. Hepatology published by Wiley Periodicals, Inc., on behalf of American Association for the Study of Liver Diseases.
Increasing evidence indicates that broadly neutralizing antibodies (bNAbs) play an important role in immune-mediated control of hepatitis C virus (HCV) infection, but the relative contribution of neutralizing antibodies targeting antigenic sites across the HCV envelope (E1 and E2) proteins is unclear. Here, we isolated thirteen E1E2-specific monoclonal antibodies (MAbs) from B cells of a single HCV-infected individual who cleared one genotype 1a infection and then became persistently infected with a second genotype 1a strain. These MAbs bound six distinct discontinuous antigenic sites on the E1 protein, the E2 protein, or the E1E2 heterodimer. Three antigenic sites, designated AS108, AS112 (an N-terminal E1 site), and AS146, were distinct from previously described antigenic regions (ARs) 1 to 5 and E1 sites. Antibodies targeting four sites (AR3, AR4-5, AS108, and AS146) were broadly neutralizing. These MAbs also displayed distinct patterns of relative neutralizing potency (i.e., neutralization profiles) across a panel of diverse HCV strains, which led to complementary neutralizing breadth when they were tested in combination. Overall, this study demonstrates that HCV bNAb epitopes are not restricted to previously described antigenic sites, expanding the number of sites that could be targeted for vaccine development.IMPORTANCE Worldwide, more than 70 million people are infected with hepatitis C virus (HCV), which is a leading cause of hepatocellular carcinoma and liver transplantation. Despite the development of potent direct acting antivirals (DAAs) for HCV treatment, a vaccine is urgently needed due to the high cost of treatment and the possibility of reinfection after cure. Induction of multiple broadly neutralizing antibodies (bNAbs) that target distinct epitopes on the HCV envelope proteins is one approach to vaccine development. However, antigenic sites targeted by bNAbs in individuals with spontaneous control of HCV have not been fully defined. In this study, we characterize 13 monoclonal antibodies (MAbs) from a single person who cleared an HCV infection without treatment, and we identify 3 new sites targeted by neutralizing antibodies. The sites targeted by these MAbs could inform HCV vaccine development. Copyright © 2019 American Society for Microbiology.
Epstein-Barr virus (EBV) is a ubiquitous human pathogen associated with Burkitt’s lymphoma and nasopharyngeal carcinoma. Although the EBV genome harbors more than a hundred genes, a full transcription map with EBV polyadenylation profiles remains unknown. To elucidate the 3′ ends of all EBV transcripts genome-wide, we performed the first comprehensive analysis of viral polyadenylation sites (pA sites) using our previously reported polyadenylation sequencing (PA-seq) technology. We identified that EBV utilizes a total of 62?pA sites in JSC-1, 60 in Raji, and 53 in Akata cells for the expression of EBV genes from both plus and minus DNA strands; 42 of these pA sites are commonly used in all three cell lines. The majority of identified pA sites were mapped to the intergenic regions downstream of previously annotated EBV open reading frames (ORFs) and viral promoters. pA sites lacking an association with any known EBV genes were also identified, mostly for the minus DNA strand within the EBNA locus, a major locus responsible for maintenance of viral latency and cell transformation. The expression of these novel antisense transcripts to EBNA were verified by 3′ rapid amplification of cDNA ends (RACE) and Northern blot analyses in several EBV-positive (EBV+) cell lines. In contrast to EBNA RNA expressed during latency, expression of EBNA-antisense transcripts, which is restricted in latent cells, can be significantly induced by viral lytic infection, suggesting potential regulation of viral gene expression by EBNA-antisense transcription during lytic EBV infection. Our data provide the first evidence that EBV has an unrecognized mechanism that regulates EBV reactivation from latency.IMPORTANCE Epstein-Barr virus represents an important human pathogen with an etiological role in the development of several cancers. By elucidation of a genome-wide polyadenylation landscape of EBV in JSC-1, Raji, and Akata cells, we have redefined the EBV transcriptome and mapped individual polymerase II (Pol II) transcripts of viral genes to each one of the mapped pA sites at single-nucleotide resolution as well as the depth of expression. By unveiling a new class of viral lytic RNA transcripts antisense to latent EBNAs, we provide a novel mechanism of how EBV might control the expression of viral latent genes and lytic infection. Thus, this report takes another step closer to understanding EBV gene structure and expression and paves a new path for antiviral approaches.This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Circulating DNA in plasma consists of short DNA fragments. The biological processes generating such fragments are not well understood. DNASE1L3 is a secreted DNASE1-like nuclease capable of digesting DNA in chromatin, and its absence causes anti-DNA responses and autoimmunity in humans and mice. We found that the deletion of Dnase1l3 in mice resulted in aberrations in the fragmentation of plasma DNA. Such aberrations included an increase in short DNA molecules below 120 bp, which was positively correlated with anti-DNA antibody levels. We also observed an increase in long, multinucleosomal DNA molecules and decreased frequencies of the most common end motifs found in plasma DNA. These aberrations were independent of anti-DNA response, suggesting that they represented a primary effect of DNASE1L3 loss. Pregnant Dnase1l3-/- mice carrying Dnase1l3+/- fetuses showed a partial restoration of normal frequencies of plasma DNA end motifs, suggesting that DNASE1L3 from Dnase1l3-proficient fetuses could enter maternal systemic circulation and affect both fetal and maternal DNA fragmentation in a systemic as well as local manner. However, the observed shortening of circulating fetal DNA relative to maternal DNA was not affected by the deletion of Dnase1l3 Collectively, our findings demonstrate that DNASE1L3 plays a role in circulating plasma DNA homeostasis by enhancing fragmentation and influencing end-motif frequencies. These results support a distinct role of DNASE1L3 as a regulator of the physical form and availability of cell-free DNA and may have important implications for the mechanism whereby this enzyme prevents autoimmunity. Copyright © 2019 the Author(s). Published by PNAS.
a-Difluoromethylornithine reduces gastric carcinogenesis by causing mutations in Helicobacter pylori cagY.
Infection by Helicobacter pylori is the primary cause of gastric adenocarcinoma. The most potent H. pylori virulence factor is cytotoxin-associated gene A (CagA), which is translocated by a type 4 secretion system (T4SS) into gastric epithelial cells and activates oncogenic signaling pathways. The gene cagY encodes for a key component of the T4SS and can undergo gene rearrangements. We have shown that the cancer chemopreventive agent a-difluoromethylornithine (DFMO), known to inhibit the enzyme ornithine decarboxylase, reduces H. pylori-mediated gastric cancer incidence in Mongolian gerbils. In the present study, we questioned whether DFMO might directly affect H. pylori pathogenicity. We show that H. pylori output strains isolated from gerbils treated with DFMO exhibit reduced ability to translocate CagA in gastric epithelial cells. Further, we frequently detected genomic modifications in the middle repeat region of the cagY gene of output strains from DFMO-treated animals, which were associated with alterations in the CagY protein. Gerbils did not develop carcinoma when infected with a DFMO output strain containing rearranged cagY or the parental strain in which the wild-type cagY was replaced by cagY with DFMO-induced rearrangements. Lastly, we demonstrate that in vitro treatment of H. pylori by DFMO induces oxidative DNA damage, expression of the DNA repair enzyme MutS2, and mutations in cagY, demonstrating that DFMO directly affects genomic stability. Deletion of mutS2 abrogated the ability of DFMO to induce cagY rearrangements directly. In conclusion, DFMO-induced oxidative stress in H. pylori leads to genomic alterations and attenuates virulence.
Hybrid sequencing-based personal full-length transcriptomic analysis implicates proteostatic stress in metastatic ovarian cancer.
Comprehensive molecular characterization of myriad somatic alterations and aberrant gene expressions at personal level is key to precision cancer therapy, yet limited by current short-read sequencing technology, individualized catalog of complete genomic and transcriptomic features is thus far elusive. Here, we integrated second- and third-generation sequencing platforms to generate a multidimensional dataset on a patient affected by metastatic epithelial ovarian cancer. Whole-genome and hybrid transcriptome dissection captured global genetic and transcriptional variants at previously unparalleled resolution. Particularly, single-molecule mRNA sequencing identified a vast array of unannotated transcripts, novel long noncoding RNAs and gene chimeras, permitting accurate determination of transcription start, splice, polyadenylation and fusion sites. Phylogenetic and enrichment inference of isoform-level measurements implicated early functional divergence and cytosolic proteostatic stress in shaping ovarian tumorigenesis. A complementary imaging-based high-throughput drug screen was performed and subsequently validated, which consistently pinpointed proteasome inhibitors as an effective therapeutic regime by inducing protein aggregates in ovarian cancer cells. Therefore, our study suggests that clinical application of the emerging long-read full-length analysis for improving molecular diagnostics is feasible and informative. An in-depth understanding of the tumor transcriptome complexity allowed by leveraging the hybrid sequencing approach lays the basis to reveal novel and valid therapeutic vulnerabilities in advanced ovarian malignancies.
The use of bacteriophages represents a valid alternative to conventional antimicrobial treatments, overcoming the widespread bacterial antibiotic resistance phenomenon. In this work, we evaluated whether biomimetic hydroxyapatite (HA) nanocrystals are able to enhance some properties of bacteriophages. The final goal of this study was to demonstrate that biomimetic HA nanocrystals can be used for bacteriophage delivery in the context of bacterial infections, and contribute – at the same time – to enhance some of the biological properties of the same bacteriophages such as stability, preservation, antimicrobial activity, and so on.Phage isolation and characterization were carried out by using Mitomycin C and following double-layer agar technique. The biomimetic HA water suspension was synthesized in order to obtain nanocrystals with plate-like morphology and nanometric dimensions. The interaction of phages with the HA was investigated by dynamic light scattering and Zeta potential analyses. The cytotoxicity and intracellular killing activities of the phage-HA complex were evaluated in human hepatocellular carcinoma HepG2 cells. The bacterial inhibition capacity of the complex was assessed on chicken minced meat samples infected with Salmonella Rissen.Our data highlighted that the biomimetic HA nanocrystal-bacteriophage complex was more stable and more effective than phages alone in all tested experimental conditions.Our results evidenced the important contribution of biomimetic HA nanocrystals: they act as an excellent carrier for bacteriophage delivery and enhance its biological characteristics. This study confirmed the significant role of the mineral HA when it is complexed with biological entities like bacteriophages, as it has been shown for molecules such as lactoferrin.
TSD: A Computational Tool To Study the Complex Structural Variants Using PacBio Targeted Sequencing Data.
PacBio sequencing is a powerful approach to study DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of endogenous or exogenous sequences. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd. Copyright © 2019 Meng et al.
Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723.
Phomafungin is a recently reported broad spectrum antifungal compound but its biosynthetic pathway is unknown. We combed publicly available Phoma genomes but failed to find any putative biosynthetic gene cluster that could account for its biosynthesis.Therefore, we sequenced the genome of one of our Phoma strains (F3723) previously identified as having antifungal activity in a high-throughput screen. We found a biosynthetic gene cluster that was predicted to synthesize a cyclic lipodepsipeptide that differs in the amino acid composition compared to Phomafungin. Antifungal activity guided isolation yielded a new compound, BII-Rafflesfungin, the structure of which was determined.We describe the NRPS-t1PKS cluster ‘BIIRfg’ compatible with the synthesis of the cyclic lipodepsipeptide BII-Rafflesfungin [HMHDA-L-Ala-L-Glu-L-Asn-L-Ser-L-Ser-D-Ser-D-allo-Thr-Gly]. We report new Stachelhaus codes for Ala, Glu, Asn, Ser, Thr, and Gly. We propose a mechanism for BII-Rafflesfungin biosynthesis, which involves the formation of the lipid part by BIIRfg_PKS followed by activation and transfer of the lipid chain by a predicted AMP-ligase on to the first PCP domain of the BIIRfg_NRPS gene.
Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.
The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are =?5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.