To bring personalized medicine to all patients, cancer researchers need more reliable and comprehensive views of somatic variants of all sizes that drive cancer biology.
Early detection of colorectal cancer (CRC) and its precursor lesions (adenomas) is crucial to reduce mortality rates. The fecal immunochemical test (FIT) is a non-invasive CRC screening test that detects the blood-derived protein hemoglobin. However, FIT sensitivity is suboptimal especially in detection of CRC precursor lesions. As adenoma-to-carcinoma progression is accompanied by alternative splicing, tumor-specific proteins derived from alternatively spliced RNA transcripts might serve as candidate biomarkers for CRC detection.
SMRT Sequencing of full-length androgen receptor isoforms in prostate cancer reveals previously hidden drug resistant variants
Prostate cancer is the most frequently diagnosed male cancer. For prostate cancer that has progressed to an advanced or metastatic stage, androgen deprivation therapy (ADT) is the standard of care. ADT inhibits activity of the androgen receptor (AR), a master regulator transcription factor in normal and cancerous prostate cells. The major limitation of ADT is the development of castration-resistant prostate cancer (CRPC), which is almost invariably due to transcriptional re-activation of the AR. One mechanism of AR transcriptional re-activation is expression of AR-V7, a truncated, constitutively active AR variant (AR-V) arising from alternative AR pre-mRNA splicing. Noteworthy, AR-V7 is being developed as a predictive biomarker of primary resistance to androgen receptor (AR)-targeted therapies in CRPC. Multiple additional AR-V species are expressed in clinical CRPC, but the extent to which these may be co-expressed with AR-V7 or predict resistance is not known.
ASHG PacBio Workshop: Identification and characterization of informative genetic structural variants for neurodegenerative diseases
Michael Lutz, from the Duke University Medical Center, discussed a recently published software tool that can now be used in a pipeline with SMRT Sequencing data to find structural variant…
AGBT Virtual Poster: Using the PacBio Iso-Seq method to search for novel colorectal cancer biomarkers
Early detection of colorectal cancer (CRC) and its precursor lesions (adenomas) is crucial to reduce mortality rates. The fecal immunochemical test (FIT) is a non-invasive CRC screening test that detects…
Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics.
Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio’s single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing.© The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
Proteogenomics, i.e. comprehensive integration of genomics and proteomics data, is a powerful approach identifying novel protein biomarkers. This is especially the case for proteins that differ structurally between disease and control conditions. As tumor development is associated with aberrant splicing, we focus on this rich source of cancer specific biomarkers. To this end, we developed a proteogenomic pipeline, Splicify, which is able to detect differentially expressed protein isoforms. Splicify is based on integrating RNA massive parallel sequencing data and tandem mass spectrometry proteomics data to identify protein isoforms resulting from differential splicing between two conditions. Proof of concept was obtained by applying Splicify to RNA sequencing and mass spectrometry data obtained from colorectal cancer cell line SW480, before and after siRNA-mediated down-modulation of the splicing factors SF3B1 and SRSF1. These analyses revealed 2172 and 149 differentially expressed isoforms, respectively, with peptide confirmation upon knock-down of SF3B1 and SRSF1 compared to their controls. Splice variants identified included RAC1, OSBPL3, MKI67 and SYK. One additional sample was analyzed by PacBio Iso-Seq full-length transcript sequencing after SF3B1 down-modulation. This analysis verified the alternative splicing identified by Splicify and in addition identified novel splicing events that were not represented in the human reference genome annotation. Therefore, Splicify offers a validated proteogenomic data analysis pipeline for identification of disease specific protein biomarkers resulting from mRNA alternative splicing. Splicify is publicly available on GitHub (https://github.com/NKI-TGO/SPLICIFY) and suitable to address basic research questions using pre-clinical model systems as well as translational research questions using patient-derived samples, e.g. allowing to identify clinically relevant biomarkers. Copyright © 2017, The American Society for Biochemistry and Molecular Biology.
MCF-7 breast cancer cell line PacBio generated transcriptome has ~300 novel transcribed regions, un-annotated in both RefSeq and GENCODE, and absent in the liver, heart and brain transcriptomes
Illuminating the “dark” regions of the human genome remains an ongoing effort, a decade and a half after the human genome was sequenced – RefSeq and GENCODE being two of the major annotation databases. Pacific Biosciences (PacBio) has provided open access to the transcriptome of MCF-7, a breast cancer cell line that has provided significant therapeutic advancement in breast cancer research since the 1970s. PacBio sequencing generates much longer reads compared to second-generation sequencing technologies, with a trade-off of lower throughput, higher error rate and more cost per base. Here, this transcriptome was analyzed using the YeATS pipeline, with additionally introduced kmer based algorithms, reducing computational times to a few hours on a simple workstation. Out of ~300 transcripts that have no match in both RefSeq and GENCODE, ~250 are absent in the transcriptomes of the heart, liver and brain, also provided by PacBio. Also, ~200 transcripts are absent in a recent catalogue of un-annotated long non-coding RNAs from 6,503 samples (~43 Terabases of sequence data) , and only two present in common in an experimental workflow RACE-Seq that reported 2,556 novel transcripts . ~100 transcripts have >100 amino acid open reading frames, and have the potential of being protein coding genes. ORF based annotation also identified few bacterial transcripts in the PacBio database mapped to the human genome, and one human transcript that has been annotated as bacterial in the NCBI database. The current work reiterates the under-utilization of transcriptomes for annotating genomes. It also provides new leads for investigating breast cancer by virtue of exclusively expressed transcripts not expressed in other tissues, which have the prospects of breast cancer biomarkers based on further investigations.
Chronic oil pollution related to gas and oil drilling activities is increasing in the sea due to the rising offshore petroleum industry activity. Among marine organisms, zooplankton play a crucial role in the marine ecosystem and therefore understanding the effects of crude oil chronic exposure on zooplankton is needed to determine the impact of oil in marine environments. The present study reports on the effect of crude oil on adult northern krill, Meganyctiphanes norvegica, collected during three seasons. Their sensitivity to oil was examined with oil concentration of 0.01 versus 0.1 mg oil L- 1 and photo-modified oil in flowing seawater maintained in the dark for 2 weeks at in situ temperature. Oil (polycyclic aromatic hydrocarbons, PAHs) entered the krill (on average, 350 and 4400 µg·kg- 1 wet weight in low and medium oil treatments respectively) and a larger fraction of the krill exhibited digestive gland pathologies (enhanced apoptosis and pathology of digestive tubules) in oil treatments (27–80%) compared to a significantly lower fraction (7–13%) in treatments that received no oil. However, 2-week oil exposure at these concentrations did not significantly decrease survivorship or impair basic functioning such as feeding and respiration rates. Similarly, there were only limited changes in the transcription of 7 selected genes from head tissue. Additionally, although there was significant seasonal variation in krill total lipid content and fatty acid composition, there was no treatment effect on both these parameters, which suggests limited oxidative stress under experimental conditions. Furthermore, there was no significant treatment effect on two direct measures of oxidative stress (MDA: malondialdehyde and AOPP: advanced oxidation protein products) in any of the seasons. Nevertheless, histology clearly revealed enhanced digestive gland pathologies in krill even at low concentrations. Although krill with such pathologies continue to survive, their accumulation of PAHs may be transferred up the food chain, impacting their predators and the wider ecosystem.
Bacteroides dorei dominates gut microbiome prior to autoimmunity in Finnish children at high risk for type 1 diabetes.
The incidence of the autoimmune disease, type 1 diabetes (T1D), has increased dramatically over the last half century in many developed countries and is particularly high in Finland and other Nordic countries. Along with genetic predisposition, environmental factors are thought to play a critical role in this increase. As with other autoimmune diseases, the gut microbiome is thought to play a potential role in controlling progression to T1D in children with high genetic risk, but we know little about how the gut microbiome develops in children with high genetic risk for T1D. In this study, the early development of the gut microbiomes of 76 children at high genetic risk for T1D was determined using high-throughput 16S rRNA gene sequencing. Stool samples from children born in the same hospital in Turku, Finland were collected at monthly intervals beginning at 4-6 months after birth until 2.2 years of age. Of those 76 children, 29 seroconverted to T1D-related autoimmunity (cases) including 22 who later developed T1D, the remaining 47 subjects remained healthy (controls). While several significant compositional differences in low abundant species prior to seroconversion were found, one highly abundant group composed of two closely related species, Bacteroides dorei and Bacteroides vulgatus, was significantly higher in cases compared to controls prior to seroconversion. Metagenomic sequencing of samples high in the abundance of the B. dorei/vulgatus group before seroconversion, as well as longer 16S rRNA sequencing identified this group as Bacteroides dorei. The abundance of B. dorei peaked at 7.6 months in cases, over 8 months prior to the appearance of the first islet autoantibody, suggesting that early changes in the microbiome may be useful for predicting T1D autoimmunity in genetically susceptible infants. The cause of increased B. dorei abundance in cases is not known but its timing appears to coincide with the introduction of solid food.
Deletion of tumor-suppressor genes as well as other genomic rearrangements pervade cancer genomes across numerous types of solid tumor and hematologic malignancies. However, even for a specific rearrangement, the breakpoints may vary between individuals, such as the recurrent CDKN2A deletion. Characterizing the exact breakpoints for structural variants (SVs) is useful for designating patient-specific tumor biomarkers. We propose AmBre (Amplification of Breakpoints), a method to target SV breakpoints occurring in samples composed of heterogeneous tumor and germline DNA. Additionally, AmBre validates SVs called by whole-exome/genome sequencing and hybridization arrays. AmBre involves a PCR-based approach to amplify the DNA segment containing an SV’s breakpoint and then confirms breakpoints using sequencing by Pacific Biosciences RS. To amplify breakpoints with PCR, primers tiling specified target regions are carefully selected with a simulated annealing algorithm to minimize off-target amplification and maximize efficiency at capturing all possible breakpoints within the target regions. To confirm correct amplification and obtain breakpoints, PCR amplicons are combined without barcoding and simultaneously long-read sequenced using a single SMRT cell. Our algorithm efficiently separates reads based on breakpoints. Each read group supporting the same breakpoint corresponds with an amplicon and a consensus amplicon sequence is called. AmBre was used to discover CDKN2A deletion breakpoints in cancer cell lines: A549, CEM, Detroit562, MOLT4, MCF7, and T98G. Also, we successfully assayed RUNX1-RUNX1T1 reciprocal translocations by finding both breakpoints in the Kasumi-1 cell line. AmBre successfully targets SVs where DNA harboring the breakpoints are present in 1:1000 mixtures.
One of the most significant issues surrounding next generation sequencing is the cost and the difficulty assembling short read lengths. Targeted capture enrichment of longer fragments using single molecule sequencing (SMS) is expected to improve both sequence assembly and base-call accuracy but, at present, there are very few examples of successful application of these technologic advances in translational research and clinical testing. We developed a targeted single molecule sequencing (T-SMS) panel for genes implicated in ovarian response to controlled ovarian hyperstimulation (COH) for infertility.Target enrichment was carried out using droplet-base multiplex polymerase chain reaction (PCR) technology (RainDance®) designed to yield amplicons averaging 1 kb fragment size from candidate 44 loci (99.8% unique base-pair coverage). The total targeted sequence was 3.18 Mb per sample. SMS was carried out using single molecule, real-time DNA sequencing (SMRT® Pacific Biosciences®), average raw read length?=?1178 nucleotides, 5% of the amplicons >6000 nucleotides). After filtering with circular consensus (CCS) reads, the mean read length was 3200 nucleotides (97% CCS accuracy). Primary data analyses, alignment and filtering utilized the Pacific Biosciences® SMRT portal. Secondary analysis was conducted using the Genome Analysis Toolkit for SNP discovery l and wANNOVAR for functional analysis of variants. Filtered functional variants 18 of 19 (94.7%) were further confirmed using conventional Sanger sequencing. CCS reads were able to accurately detect zygosity. Coverage within GC rich regions (i.e.VEGFR; 72% GC rich) was achieved by capturing long genomic DNA (gDNA) fragments and reading into regions that flank the capture regions. As proof of concept, a non-synonymous LHCGR variant captured in two severe OHSS cases, and verified by conventional sequencing.Combining emulsion PCR-generated 1 kb amplicons and SMRT DNA sequencing permitted greater depth of coverage for T-SMS and facilitated easier sequence assembly. To the best of our knowledge, this is the first report combining emulsion PCR and T-SMS for long reads using human DNA samples, and NGS panel designed for biomarker discovery in OHSS.
Construction of a reference genetic map of Raphanus sativus based on genotyping by whole-genome resequencing.
This manuscript provides a genetic map of Raphanus sativus that has been used as a reference genetic map for an ongoing genome sequencing project. The map was constructed based on genotyping by whole-genome resequencing of mapping parents and F 2 population. Raphanus sativus is an annual vegetable crop species of the Brassicaceae family and is one of the key plants in the seed industry, especially in East Asia. Assessment of the R. sativus genome provides fundamental resources for crop improvement as well as the study of crop genome structure and evolution. With the goal of anchoring genome sequence assemblies of R. sativus cv. WK10039 whose genome has been sequenced onto the chromosomes, we developed a reference genetic map based on genotyping of two parents (maternal WK10039 and paternal WK10024) and 93 individuals of the F2 mapping population by whole-genome resequencing. To develop high-confidence genetic markers, ~83 Gb of parental lines and ~591 Gb of mapping population data were generated as Illumina 100 bp paired-end reads. High stringent sequence analysis of the reads mapped to the 344 Mb of genome sequence scaffolds identified a total of 16,282 SNPs and 150 PCR-based markers. Using a subset of the markers, a high-density genetic map was constructed from the analysis of 2,637 markers spanning 1,538 cM with 1,000 unique framework loci. The genetic markers integrated 295 Mb of genome sequences to the cytogenetically defined chromosome arms. Comparative analysis of the chromosome-anchored sequences with Arabidopsis thaliana and Brassica rapa revealed that the R. sativus genome has evident triplicated sub-genome blocks and the structure of gene space is highly similar to that of B. rapa. The genetic map developed in this study will serve as fundamental genomic resources for the study of R. sativus.