Accurate transcript structure and abundance inference from RNA sequencing (RNA-seq) data is foundational for molecular discovery. Here we present TACO, a computational method to reconstruct a consensus transcriptome from multiple RNA-seq data sets. TACO employs novel change-point detection to demarcate transcript start and end sites, leading to improved reconstruction accuracy compared with other tools in its class. The tool is available at http://tacorna.github.io and can be readily incorporated into RNA-seq analysis workflows.
Long-read sequencing uncovers transcript features missed by short-read methods.
Long-read sequencing, coupled to cDNA capture, provides an unrivaled view of the transcriptome of chromosome 21, revealing surprises about the splicing of long noncoding RNAs. Copyright © 2018. Published by Elsevier Inc.
Alternative RNA splicing is a known phenomenon, but we still do not have a complete catalog of isoforms that explain variability in the human transcriptome. We have made significant progress in developing methods to study variability of the transcriptome, but we are far away of having a complete picture of the transcriptome. The initial methods to study gene expression were based on cloning of cDNAs and Sanger sequencing. The strategy was labor-intensive and expensive. With the development of microarrays, different methods based on exon arrays and tiling arrays provided valuable information about RNA expression. However, the microarray presented significant limitations.…
Over the past decade, the field of genomics has seen such drastic improvements in sequencing chemistries that high-throughput sequencing, or next-generation sequencing (NGS), is being applied to generate data across many disciplines. NGS instruments are becoming less expensive, faster, and smaller, and therefore are being adopted in an increasing number of laboratories, including clinical laboratories. Thus far, clinical use of NGS has been mostly focused on the human genome, for purposes such as characterizing the molecular basis of cancer or for diagnosing and understanding the basis of rare genetic disorders. There are, however, an increasing number of examples whereby NGS…
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the…
In eukaryotes, mechanisms such as alternative splicing (AS) and alternative translation initiation (ATI) contribute to organismal protein diversity. Specifically, splicing factors play crucial roles in responses to environment and development cues; however, the underlying mechanisms are not well investigated in plants. Here, we report the parallel employment of short-read RNA sequencing, single molecule long-read sequencing and proteomic identification to unravel AS isoforms and previously unannotated proteins in response to abscisic acid (ABA) treatment. Combining the data from the two sequencing methods, approximately 83.4% of intron-containing genes were alternatively spliced. Two AS types, which are referred to as alternative first exon…
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate…
The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single…
Little is known about the function of most long non-coding RNAs. But a suite of new tools might change that.
The ability to rapidly adapt cellular bioenergetic capabilities to meet rapidly changing environmental conditions is mandatory for normal cellular function and for cancer progression. Any loss of this adaptive response has the potential to compromise cellular function and render the cell more susceptible to external stressors such as oxidative stress, radiation, chemotherapeutic drugs, and hypoxia. Mitochondria play a vital role in bioenergetic and biosynthetic pathways and can rapidly adjust to meet the metabolic needs of the cell. Increased demand is met by mitochondrial biogenesis and fusion of individual mitochondria into dynamic networks, whereas a decrease in demand results in the…
Since the discovery of the T cell receptor (TcR), immunologists have assigned somatic hypermutation (SHM) as a mechanism employed solely by B cells to diversify their antigen receptors. Remarkably, we found SHM acting in the thymus on a chain locus of shark TcR. SHM in developing shark T cells likely is catalyzed by activation-induced cytidine deaminase (AID) and results in both point and tandem mutations that accumulate non-conservative amino acid replacements within complementarity-determining regions (CDRs). Mutation frequency at TcRa was as high as that seen at B cell receptor loci (BcR) in sharks and mammals, and the mechanism of SHM…
Individual organisms are linked to their communities and ecosystems via metabolic activities. Metabolic exchanges and co-dependencies have long been suggested to have a pivotal role in determining community structure. In phloem-feeding insects such metabolic interactions with bacteria enable complementation of their deprived nutrition. The phloem-feeding whitefly Bemisia tabaci (Hemiptera: Aleyrodidae) harbors an obligatory symbiotic bacterium, as well as varying combinations of facultative symbionts. This well-defined bacterial community in B. tabaci serves here as a case study for a comprehensive and systematic survey of metabolic interactions within the bacterial community and their associations with documented occurrences of bacterial combinations. We first…
Burkholderia pseudomallei, the causative agent of the high-mortality disease melioidosis, is a gram-negative bacterium that is naturally resistant to many antibiotics. There is no vaccine for melioidosis, and effective eradication is reliant on biphasic and prolonged antibiotic administration. The carbapenem drug meropenem is the current gold standard option for treating severe melioidosis. Intrinsic B. pseudomallei resistance toward meropenem has not yet been documented; however, resistance could conceivably develop over the course of infection, leading to prolonged sepsis and treatment failure.We examined our 30-year clinical collection of melioidosis cases to identify B. pseudomallei isolates with reduced meropenem susceptibility. Isolates were subjected…
DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally…