Accurate transcript structure and abundance inference from RNA sequencing (RNA-seq) data is foundational for molecular discovery. Here we present TACO, a computational method to reconstruct a consensus transcriptome from multiple RNA-seq data sets. TACO employs novel change-point detection to demarcate transcript start and end sites, leading to improved reconstruction accuracy compared with other tools in its class. The tool is available at http://tacorna.github.io and can be readily incorporated into RNA-seq analysis workflows.
Long-read sequencing uncovers transcript features missed by short-read methods.
Long-read sequencing, coupled to cDNA capture, provides an unrivaled view of the transcriptome of chromosome 21, revealing surprises about the splicing of long noncoding RNAs. Copyright © 2018. Published by Elsevier Inc.
Alternative RNA splicing is a known phenomenon, but we still do not have a complete catalog of isoforms that explain variability in the human transcriptome. We have made significant progress in developing methods to study variability of the transcriptome, but we are far away of having a complete picture of the transcriptome. The initial methods to study gene expression were based on cloning of cDNAs and Sanger sequencing. The strategy was labor-intensive and expensive. With the development of microarrays, different methods based on exon arrays and tiling arrays provided valuable information about RNA expression. However, the microarray presented significant limitations. Most of the limitations became apparent by 2005, but it was not until 2008 that an alternative method to study the transcriptome was developed. RNA Sequencing using next-generation sequencing (RNA-Seq) quickly became the technology of choice for gene expression profiling. Recently, the precision and sensitivity of RNA-Seq have come into question, especially for transcriptome reconstruction. This chapter will describe a relatively new method, “Isoform Sequencing (Iso-Seq). Iso-Seq was developed by Pacific Biosciences (PacBio), and it is capable of identifying new isoforms with extraordinary precision due to its long-read technology. The technique to create libraries is straightforward, and the PacBio RS II instrument generates the information in hours. The bioinformatics analysis is performed using the freely available SMRT® Portal software. The SMRT Portal is easy to use and capable of performing all the steps necessary to analyze the raw data and to generate high-quality full-length isoforms. For the universal acceptance of the Iso-Seq method, the capacity of the SMRT Cells needs to improve at least 10- to 100-fold to make the system affordable and attractive to users.
Over the past decade, the field of genomics has seen such drastic improvements in sequencing chemistries that high-throughput sequencing, or next-generation sequencing (NGS), is being applied to generate data across many disciplines. NGS instruments are becoming less expensive, faster, and smaller, and therefore are being adopted in an increasing number of laboratories, including clinical laboratories. Thus far, clinical use of NGS has been mostly focused on the human genome, for purposes such as characterizing the molecular basis of cancer or for diagnosing and understanding the basis of rare genetic disorders. There are, however, an increasing number of examples whereby NGS is employed to discover novel pathogens, and these cases provide precedent for the use of NGS in microbial diagnostics. NGS has many advantages over traditional microbial diagnostic methods, such as unbiased rather than pathogen-specific protocols, ability to detect fastidious or non-culturable organisms, and ability to detect co-infections. One of the most impressive advantages of NGS is that it requires little or no prior knowledge of the pathogen, unlike many other diagnostic assays; therefore for pathogen discovery, NGS is very valuable. However, despite these advantages, there are challenges involved in implementing NGS for routine clinical microbiological diagnosis. We discuss these advantages and challenges in the context of recently described research studies.
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Proteogenomic analysis reveals alternative splicing and translation as part of the abscisic acid response in Arabidopsis seedlings.
In eukaryotes, mechanisms such as alternative splicing (AS) and alternative translation initiation (ATI) contribute to organismal protein diversity. Specifically, splicing factors play crucial roles in responses to environment and development cues; however, the underlying mechanisms are not well investigated in plants. Here, we report the parallel employment of short-read RNA sequencing, single molecule long-read sequencing and proteomic identification to unravel AS isoforms and previously unannotated proteins in response to abscisic acid (ABA) treatment. Combining the data from the two sequencing methods, approximately 83.4% of intron-containing genes were alternatively spliced. Two AS types, which are referred to as alternative first exon (AFE) and alternative last exon (ALE), were more abundant than intron retention (IR); however, by contrast to AS events detected under normal conditions, differentially expressed AS isoforms were more likely to be translated. ABA extensively affects the AS pattern, indicated by the increasing number of non-conventional splicing sites. This work also identified thousands of unannotated peptides and proteins by ATI based on mass spectrometry and a virtual peptide library deduced from both strands of coding regions within the Arabidopsis genome. The results enhance our understanding of AS and alternative translation mechanisms under normal conditions, and in response to ABA treatment.© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.© 2018 Fiddes et al.; Published by Cold Spring Harbor Laboratory Press.
The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro-insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and diseases of the human brain.
Little is known about the function of most long non-coding RNAs. But a suite of new tools might change that.
The ability to rapidly adapt cellular bioenergetic capabilities to meet rapidly changing environmental conditions is mandatory for normal cellular function and for cancer progression. Any loss of this adaptive response has the potential to compromise cellular function and render the cell more susceptible to external stressors such as oxidative stress, radiation, chemotherapeutic drugs, and hypoxia. Mitochondria play a vital role in bioenergetic and biosynthetic pathways and can rapidly adjust to meet the metabolic needs of the cell. Increased demand is met by mitochondrial biogenesis and fusion of individual mitochondria into dynamic networks, whereas a decrease in demand results in the removal of superfluous mitochondria through fission and mitophagy. Effective communication between nucleus and mitochondria (mito-nuclear cross talk), involving the generation of different mitochondrial stress signals as well as the nuclear stress response pathways to deal with these stressors, maintains bioenergetic homeostasis under most conditions. However, when mitochondrial DNA (mtDNA) mutations accumulate and mito-nuclear cross talk falters, mitochondria fail to deliver critical functional outputs. Mutations in mtDNA have been implicated in neuromuscular and neurodegenerative mitochondriopathies and complex diseases such as diabetes, cardiovascular diseases, gastrointestinal disorders, skin disorders, aging, and cancer. In some cases, drastic measures such as acquisition of new mitochondria from donor cells occurs to ensure cell survival. This review starts with a brief discussion of the evolutionary origin of mitochondria and summarizes how mutations in mtDNA lead to mitochondriopathies and other degenerative diseases. Mito-nuclear cross talk, including various stress signals generated by mitochondria and corresponding stress response pathways activated by the nucleus are summarized. We also introduce and discuss a small family of recently discovered hormone-like mitopeptides that modulate body metabolism. Under conditions of severe mitochondrial stress, mitochondria have been shown to traffic between cells, replacing mitochondria in cells with damaged and malfunctional mtDNA. Understanding the processes involved in cellular bioenergetics and metabolic adaptation has the potential to generate new knowledge that will lead to improved treatment of many of the metabolic, degenerative, and age-related inflammatory diseases that characterize modern societies.
Since the discovery of the T cell receptor (TcR), immunologists have assigned somatic hypermutation (SHM) as a mechanism employed solely by B cells to diversify their antigen receptors. Remarkably, we found SHM acting in the thymus on a chain locus of shark TcR. SHM in developing shark T cells likely is catalyzed by activation-induced cytidine deaminase (AID) and results in both point and tandem mutations that accumulate non-conservative amino acid replacements within complementarity-determining regions (CDRs). Mutation frequency at TcRa was as high as that seen at B cell receptor loci (BcR) in sharks and mammals, and the mechanism of SHM shares unique characteristics first detected at shark BcR loci. Additionally, fluorescence in situ hybridization showed the strongest AID expression in thymic corticomedullary junction and medulla. We suggest that TcRa utilizes SHM to broaden diversification of the primary aß T cell repertoire in sharks, the first reported use in vertebrates.© 2018, Ott et al.
Modeling trophic dependencies and exchanges among insects’ bacterial symbionts in a host-simulated environment.
Individual organisms are linked to their communities and ecosystems via metabolic activities. Metabolic exchanges and co-dependencies have long been suggested to have a pivotal role in determining community structure. In phloem-feeding insects such metabolic interactions with bacteria enable complementation of their deprived nutrition. The phloem-feeding whitefly Bemisia tabaci (Hemiptera: Aleyrodidae) harbors an obligatory symbiotic bacterium, as well as varying combinations of facultative symbionts. This well-defined bacterial community in B. tabaci serves here as a case study for a comprehensive and systematic survey of metabolic interactions within the bacterial community and their associations with documented occurrences of bacterial combinations. We first reconstructed the metabolic networks of five common B. tabaci symbionts genera (Portiera, Rickettsia, Hamiltonella, Cardinium and Wolbachia), and then used network analysis approaches to predict: (1) species-specific metabolic capacities in a simulated bacteriocyte-like environment; (2) metabolic capacities of the corresponding species’ combinations, and (3) dependencies of each species on different media components.The predictions for metabolic capacities of the symbionts in the host environment were in general agreement with previously reported genome analyses, each focused on the single-species level. The analysis suggests several previously un-reported routes for complementary interactions and estimated the dependency of each symbiont in specific host metabolites. No clear association was detected between metabolic co-dependencies and co-occurrence patterns.The analysis generated predictions for testable hypotheses of metabolic exchanges and co-dependencies in bacterial communities and by crossing them with co-occurrence profiles, contextualized interaction patterns into a wider ecological perspective.
Raising the stakes: Loss of efflux pump regulation decreases meropenem susceptibility in Burkholderia pseudomallei
Burkholderia pseudomallei, the causative agent of the high-mortality disease melioidosis, is a gram-negative bacterium that is naturally resistant to many antibiotics. There is no vaccine for melioidosis, and effective eradication is reliant on biphasic and prolonged antibiotic administration. The carbapenem drug meropenem is the current gold standard option for treating severe melioidosis. Intrinsic B. pseudomallei resistance toward meropenem has not yet been documented; however, resistance could conceivably develop over the course of infection, leading to prolonged sepsis and treatment failure.We examined our 30-year clinical collection of melioidosis cases to identify B. pseudomallei isolates with reduced meropenem susceptibility. Isolates were subjected to minimum inhibitory concentration (MIC) testing toward meropenem. Paired isolates from patients who had evolved decreased susceptibility were subjected to whole-genome sequencing. Select agent-compliant genetic manipulation was carried out to confirm the molecular mechanisms conferring resistance.We identified 11 melioidosis cases where B. pseudomallei isolates developed decreased susceptibility toward meropenem during treatment, including 2 cases not treated with this antibiotic. Meropenem MICs increased from 0.5-0.75 µg/mL to 3-8 µg/mL. Comparative genomics identified multiple mutations affecting multidrug resistance-nodulation-division (RND) efflux pump regulators, with concomitant overexpression of their corresponding pumps. All cases were refractory to treatment despite aggressive, targeted therapy, and 2 were associated with a fatal outcome.This study confirms the role of RND efflux pumps in decreased meropenem susceptibility in B. pseudomallei. These findings have important ramifications for the diagnosis, treatment, and management of life-threatening melioidosis cases.
Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.
DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.© 2018 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.