Menu
September 22, 2019

Transcription-associated mutation promotes RNA complexity in highly expressed genes – A major new source of selectable variation.

Alternatively spliced transcript isoforms are thought to play a critical role for functional diversity. However, the mechanism generating the enormous diversity of spliced transcript isoforms remains unknown, and its biological significance remains unclear. We analyzed transcriptomes in saker falcons, chickens, and mice to show that alternative splicing occurs more frequently, yielding more isoforms, in highly expressed genes. We focused on hemoglobin in the falcon, the most abundantly expressed genes in blood, finding that alternative splicing produces 10-fold more isoforms than expected from the number of splice junctions in the genome. These isoforms were produced mainly by alternative use of de novo splice sites generated by transcription-associated mutation (TAM), not by the RNA editing mechanism normally invoked. We found that high expression of globin genes increases mutation frequencies during transcription, especially on nontranscribed DNA strands. After DNA replication, transcribed strands inherit these somatic mutations, creating de novo splice sites, and generating multiple distinct isoforms in the cell clone. Bisulfate sequencing revealed that DNA methylation may counteract this process by suppressing TAM, suggesting DNA methylation can spatially regulate RNA complexity. RNA profiling showed that falcons living on the high Qinghai-Tibetan Plateau possess greater global gene expression levels and higher diversity of mean to high abundance isoforms (reads per kilobases per million mapped reads?=18) than their low-altitude counterparts, and we speculate that this may enhance their oxygen transport capacity under low-oxygen environments. Thus, TAM-induced RNA diversity may be physiologically significant, providing an alternative strategy in lifestyle evolution.


September 22, 2019

Novel molecules lncRNAs, tRFs and circRNAs deciphered from next-generation sequencing/RNA sequencing: computational databases and tools.

Powerful next-generation sequencing (NGS) technologies, more specifically RNA sequencing (RNA-seq), have been pivotal toward the detection and analysis and hypotheses generation of novel biomolecules, long noncoding RNAs (lncRNAs), tRNA-derived fragments (tRFs) and circular RNAs (circRNAs). Experimental validation of the occurrence of these biomolecules inside the cell has been reported. Their differential expression and functionally important role in several cancers types as well as other diseases such as Alzheimer’s and cardiovascular diseases have garnered interest toward further studies in this research arena. In this review, starting from a brief relevant introduction to NGS and RNA-seq and the expression and role of lncRNAs, tRFs and circRNAs in cancer, we have comprehensively analyzed the current landscape of databases developed and computational software used for analyses and visualization for this emerging and highly interesting field of these novel biomolecules. Our review will help the end users and research investigators gain information on the existing databases and tools as well as an understanding of the specific features which these offer. This will be useful for the researchers in their proper usage thereby guiding them toward novel hypotheses generation and saving time and costs involved in extensive experimental processes in these three different novel functional RNAs.© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.


September 22, 2019

A survey of the sorghum transcriptome using single-molecule long reads.

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ~11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.


September 22, 2019

Transcriptome sequencing reveals thousands of novel long non-coding RNAs in B cell lymphoma.

Gene profiling of diffuse large B cell lymphoma (DLBCL) has revealed broad gene expression deregulation compared to normal B cells. While many studies have interrogated well known and annotated genes in DLBCL, none have yet performed a systematic analysis to uncover novel unannotated long non-coding RNAs (lncRNA) in DLBCL. In this study we sought to uncover these lncRNAs by examining RNA-seq data from primary DLBCL tumors and performed supporting analysis to identify potential role of these lncRNAs in DLBCL.We performed a systematic analysis of novel lncRNAs from the poly-adenylated transcriptome of 116 primary DLBCL samples. RNA-seq data were processed using de novo transcript assembly pipeline to discover novel lncRNAs in DLBCL. Systematic functional, mutational, cross-species, and co-expression analyses using numerous bioinformatics tools and statistical analysis were performed to characterize these novel lncRNAs.We identified 2,632 novel, multi-exonic lncRNAs expressed in more than one tumor, two-thirds of which are not expressed in normal B cells. Long read single molecule sequencing supports the splicing structure of many of these lncRNAs. More than one-third of novel lncRNAs are differentially expressed between the two major DLBCL subtypes, ABC and GCB. Novel lncRNAs are enriched at DLBCL super-enhancers, with a fraction of them conserved between human and dog lymphomas. We see transposable elements (TE) overlap in the exonic regions; particularly significant in the last exon of the novel lncRNAs suggest potential usage of cryptic TE polyadenylation signals. We identified highly co-expressed protein coding genes for at least 88 % of the novel lncRNAs. Functional enrichment analysis of co-expressed genes predicts a potential function for about half of novel lncRNAs. Finally, systematic structural analysis of candidate point mutations (SNVs) suggests that such mutations frequently stabilize lncRNA structures instead of destabilizing them.Discovery of these 2,632 novel lncRNAs in DLBCL significantly expands the lymphoma transcriptome and our analysis identifies potential roles of these lncRNAs in lymphomagenesis and/or tumor maintenance. For further studies, these novel lncRNAs also provide an abundant source of new targets for antisense oligonucleotide pharmacology, including shared targets between human and dog lymphomas.


September 22, 2019

Long read reference genome-free reconstruction of a full-length transcriptome from Astragalus membranaceus reveals transcript variants involved in bioactive compound biosynthesis.

Astragalus membranaceus, also known as Huangqi in China, is one of the most widely used medicinal herbs in Traditional Chinese Medicine. Traditional Chinese Medicine formulations from Astragalus membranaceus have been used to treat a wide range of illnesses, such as cardiovascular disease, type 2 diabetes, nephritis and cancers. Pharmacological studies have shown that immunomodulating, anti-hyperglycemic, anti-inflammatory, antioxidant and antiviral activities exist in the extract of Astragalus membranaceus. Therefore, characterising the biosynthesis of bioactive compounds in Astragalus membranaceus, such as Astragalosides, Calycosin and Calycosin-7-O-ß-d-glucoside, is of particular importance for further genetic studies of Astragalus membranaceus. In this study, we reconstructed the Astragalus membranaceus full-length transcriptomes from leaf and root tissues using PacBio Iso-Seq long reads. We identified 27 975 and 22 343 full-length unique transcript models in each tissue respectively. Compared with previous studies that used short read sequencing, our reconstructed transcripts are longer, and are more likely to be full-length and include numerous transcript variants. Moreover, we also re-characterised and identified potential transcript variants of genes involved in Astragalosides, Calycosin and Calycosin-7-O-ß-d-glucoside biosynthesis. In conclusion, our study provides a practical pipeline to characterise the full-length transcriptome for species without a reference genome and a useful genomic resource for exploring the biosynthesis of active compounds in Astragalus membranaceus.


September 22, 2019

Isoform evolution in primates through independent combination of alternative RNA processing events.

Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875?bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


September 22, 2019

Single molecule RNA sequencing uncovers trans-splicing and improves annotations in Anopheles stephensi.

Single molecule real-time (SMRT) sequencing has recently been used to obtain full-length cDNA sequences that improve genome annotation and reveal RNA isoforms. Here, we used one such method called isoform sequencing from Pacific Biosciences (PacBio) to sequence a cDNA library from the Asian malaria mosquito Anopheles stephensi. More than 600 000 full-length cDNAs, referred to as reads of insert, were identified. Owing to the inherently high error rate of PacBio sequencing, we tested different approaches for error correction. We found that error correction using Illumina RNA sequencing (RNA-seq) generated more data than using the default SMRT pipeline. The full-length error-corrected PacBio reads greatly improved the gene annotation of Anopheles stephensi: 4867 gene models were updated and 1785 alternatively spliced isoforms were added to the annotation. In addition, six trans-splicing events, where exons from different primary transcripts were joined together, were identified in An. stephensi. All six trans-splicing events appear to be conserved in Culicidae, as they are also found in Anopheles gambiae and Aedes aegypti. The proteins encoded by trans-splicing events are also highly conserved and the orthologues of these proteins are cis-spliced in outgroup species, indicating that trans-splicing may arise as a mechanism to rescue genes that broke up during evolution.© 2017 The Royal Entomological Society.


September 22, 2019

A human-specific switch of alternatively spliced AFMID isoforms contributes to TP53 mutations and tumor recurrence in hepatocellular carcinoma.

Pre-mRNA splicing can contribute to the switch of cell identity that occurs in carcinogenesis. Here, we analyze a large collection of RNA-seq data sets and report that splicing changes in hepatocyte-specific enzymes, such as AFMID and KHK, are associated with HCC patients’ survival and relapse. The switch of AFMID isoforms is an early event in HCC development and is associated with driver mutations in TP53 and ARID1A The switch of AFMID isoforms is human-specific and not detectable in other species, including primates. Finally, we show that overexpression of the full-length AFMID isoform leads to a higher NAD+ level, lower DNA-damage response, and slower cell growth in HepG2 cells. The integrative analysis uncovered a mechanistic link between splicing switches, de novo NAD+ biosynthesis, driver mutations, and HCC recurrence.© 2018 Lin et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Resolving the complexity of human skin metagenomes using single-molecule sequencing.

Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation.The species comprising a microbial community are often difficult to deconvolute due to technical limitations inherent to most short-read sequencing technologies. Here, we leverage new advances in sequencing technology, single-molecule sequencing, to significantly improve reconstruction of a complex human skin microbial community. With this long-read technology, we were able to reconstruct and annotate a closed, high-quality genome of a previously uncharacterized skin species. We demonstrate that hybrid approaches with short-read technology are sufficiently powerful to reconstruct even single-nucleotide polymorphism level variation of species in this a community. Copyright © 2016 Tsai et al.


September 22, 2019

Contemporary evolution of a Lepidopteran species, Heliothis virescens, in response to modern agricultural practices.

Adaptation to human-induced environmental change has the potential to profoundly influence the genomic architecture of affected species. This is particularly true in agricultural ecosystems, where anthropogenic selection pressure is strong. Heliothis virescens primarily feeds on cotton in its larval stages, and US populations have been declining since the widespread planting of transgenic cotton, which endogenously expresses proteins derived from Bacillus thuringiensis (Bt). No physiological adaptation to Bt toxin has been found in the field, so adaptation in this altered environment could involve (i) shifts in host plant selection mechanisms to avoid cotton, (ii) changes in detoxification mechanisms required for cotton-feeding vs. feeding on other hosts or (iii) loss of resistance to previously used management practices including insecticides. Here, we begin to address whether such changes occurred in H. virescens populations between 1997 and 2012, as Bt-cotton cultivation spread through the agricultural landscape. For our study, we produced an H. virescens genome assembly and used this in concert with a ddRAD-seq-enabled genome scan to identify loci with significant allele frequency changes over the 15-year period. Genetic changes at a previously described H. virescens insecticide target of selection were detectable in our genome scan and increased our confidence in this methodology. Additional loci were also detected as being under selection, and we quantified the selection strength required to elicit observed allele frequency changes at each locus. Potential contributions of genes near loci under selection to adaptive phenotypes in the H. virescens cotton system are discussed.© 2017 John Wiley & Sons Ltd.


September 22, 2019

Complete genome sequence of Sphingobium baderi DE-13, an alkyl-substituted aniline-mineralizing bacterium.

Alkyl-substituted aniline is an important aniline derivative that may be associated with serious environmental risks. Previously, Sphingobium baderi DE-13, a bacterium that can mineralize alkyl substituted anilines such as 2,6-dimethylaniline, 2,6-diethylaniline, 2-methyl-6-ethylaniline, 2-methylaniline, and 2-ethylaniline, was isolated from active sludge. Here, we report the complete genome sequence of strain DE-13. It contains one circular chromosome and eight circular plasmids with total 4,583,422 bp and GC content of 62.41%. The reported and predicted genes involved in the catabolism of alkyl-substituted anilines are indicated. This study will provide insights into the bacterial catabolism of alkyl substituted anilines.


September 22, 2019

Genome analysis of Taraxacum kok-saghyz Rodin provides new insights into rubber biosynthesis

The Russian dandelion Taraxacum kok-saghyz Rodin (TKS), a member of the Composite family and a potential alternative source of natural rubber (NR) and inulin, is an ideal model system for studying rubber biosynthesis. Here we present the draft genome of TKS, the first assembled NR-producing weed plant. The draft TKS genome assembly has a length of 1.29 Gb, containing 46,731 predicted protein-coding genes and 68.56% repeats, in which the LTR-RT elements predominantly contribute to the genome enlargement. We analyzed the heterozygous regions/genes, suggesting its possible involvement in inbreeding depression. Through comparative studies between rubber-producing and non-rubber-producing plants, we found that enzymes of the mevalonate (MVA) pathway and rubber elongation might be critical for rubber biosynthesis, and several key isoforms have been isolated showing predominantly expressed in the latex, indicating their crucial functions in rubber biosynthesis. Moreover, for two important families in rubber elongation, the CPT/CPTL and REF/SRPP families, diverse evolutionary tracks have been revealed. These results provide valuable resources and new insights into the mechanism of NR biosynthesis, and facilitate the development of alternative NR producing crops.


September 22, 2019

Stalking a lethal superbug by whole-genome sequencing and phylogenetics: Influence on unraveling a major hospital outbreak of carbapenem-resistant Klebsiella pneumoniae.

From July 2010-April 2013, Leipzig University Hospital experienced the largest outbreak of a Klebsiella pneumoniae carbapenemase 2 (KPC-2)-producing Klebsiella pneumoniae (KPC-2-Kp) strain observed in Germany to date. After termination of the outbreak, we aimed to reconstruct transmission pathways by phylogenetics based on whole-genome sequencing (WGS).One hundred seventeen KPC-2-Kp isolates from 89 outbreak patients, 5 environmental KPC-2-Kp isolates, and 24 K pneumoniae strains not linked to the outbreak underwent WGS. Phylogenetic analysis was performed blinded to clinical data and based on the genomic reads.A patient from Greece was confirmed as the source of the outbreak. Transmission pathways for 11 out of 89 patients (12.4%) were plausibly explained by descriptive epidemiology, applying strict definitions. Five of these and an additional 15 (ie, 20 out of 89 patients [22.5%]) were confirmed by phylogenetics. The rate of phylogenetically confirmed transmissions increased significantly from 8 out of 66 (12.1% for the time period before) to 12 out of 23 patients (52.2% for the time period after; P?<.001) after implementation of systematic screening for KPC-2-Kp (33,623 screening investigations within 11 months). Using descriptive epidemiology, systematic screening showed no significant effect (7 out of 66 [10.6%] vs 4 out of 23 [17.4%] patients; P?=?.465). The phylogenetic analysis supported the assumption that a contaminated positioning pillow served as a reservoir for the persistence of KPC-2-Kp.Effective phylogenetic identification of transmissions requires systematic microbiologic screening. Extensive screening and phylogenetic analysis based on WGS should be started as soon as possible in a bacterial outbreak situation. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.


September 22, 2019

The genome sequence of the soft-rot fungus Penicillium purpurogenum reveals a high gene dosage for lignocellulolytic enzymes.

The high lignocellulolytic activity displayed by the soft-rot fungus Penicillium purpurogenum has made it a target for the study of novel lignocellulolytic enzymes. We have obtained a reference genome of 36.2 Mb of non-redundant sequence (11,057 protein-coding genes). The 49 largest scaffolds cover 90% of the assembly, and Core Eukaryotic Genes Mapping Approach (CEGMA) analysis reveals that our assembly captures almost all protein-coding genes. RNA-seq was performed and 93.1% of the reads aligned to the assembled genome. These data, plus the independent sequencing of a set of genes of lignocellulose-degrading enzymes, validate the quality of the genome sequence. P. purpurogenum shows a higher number of proteins with CAZy motifs, transcription factors and transporters as compared to other sequenced Penicillia. These results demonstrate the great potential for lignocellulolytic activity of this fungus and the possible use of its enzymes in related industrial applications.


September 22, 2019

Revisiting the contribution of gene duplication of blaOXA-23 in carbapenem-resistant Acinetobacter baumannii.

Gene duplication has been discovered for many antimicrobial resistance genes in bacterial genomes and has been considered a source of elevated antimicrobial resistance.1 The gene blaOXA-23is a major determinant in the emergence of carbapenem-resistant Acinetobacter baumannii (CRAB).2–4 We have previously reported the widespread duplication of blaOXA-23by surveying 113 clinical CRAB isolates in China.5 However, in these isolates the blaOXA-23 copy number did not correlate well with the MIC of imipenem. A similar phenomenon was also reported recently by Yoon et al.6 One reasonable explanation is that, in addition to gene duplica- tions, other mechanisms might also impact on the MIC, such as the presence of specific outer membrane proteins and/ortheover-expression of resistance–nodulation–division (RND)-type efflux pumps.7 Often, these mechanisms might vary in their performance when in different genomic contexts. Instead of making comparisons between clinical isolates, in this study we cultured A. baumannii under treatment with carbapenem, thus avoiding any interference induced in different genomic contexts. If an increase in the blaOXA-23 copy number or MIC were to occur within the same strain, the contribution of gene duplication to carbapenem resistance would be acknowledged.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.