Menu
September 22, 2019

Chromosome-level reference genome and alternative splicing atlas of moso bamboo (Phyllostachys edulis).

Bamboo is one of the most important nontimber forestry products worldwide. However, a chromosome-level reference genome is lacking, and an evolutionary view of alternative splicing (AS) in bamboo remains unclear despite emerging omics data and improved technologies.Here, we provide a chromosome-level de novo genome assembly of moso bamboo (Phyllostachys edulis) using additional abundance sequencing data and a Hi-C scaffolding strategy. The significantly improved genome is a scaffold N50 of 79.90 Mb, approximately 243 times longer than the previous version. A total of 51,074 high-quality protein-coding loci with intact structures were identified using single-molecule real-time sequencing and manual verification. Moreover, we provide a comprehensive AS profile based on the identification of 266,711 unique AS events in 25,225 AS genes by large-scale transcriptomic sequencing of 26 representative bamboo tissues using both the Illumina and Pacific Biosciences sequencing platforms. Through comparisons with orthologous genes in related plant species, we observed that the AS genes are concentrated among more conserved genes that tend to accumulate higher transcript levels and share less tissue specificity. Furthermore, gene family expansion, abundant AS, and positive selection were identified in crucial genes involved in the lignin biosynthetic pathway of moso bamboo.These fundamental studies provide useful information for future in-depth analyses of comparative genome and AS features. Additionally, our results highlight a global perspective of AS during evolution and diversification in bamboo.


September 22, 2019

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.


September 22, 2019

Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads.

Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs.We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms.PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.


September 22, 2019

Genome-wide identification and analysis of the ALTERNATIVE OXIDASE gene family in diploid and hexaploid wheat.

A comprehensive understanding of wheat responses to environmental stress will contribute to the long-term goal of feeding the planet. ALERNATIVE OXIDASE (AOX) genes encode proteins involved in a bypass of the electron transport chain and are also known to be involved in stress tolerance in multiple species. Here, we report the identification and characterization of the AOX gene family in diploid and hexaploid wheat. Four genes each were found in the diploid ancestors Triticum urartu, and Aegilops tauschii, and three in Aegilops speltoides. In hexaploid wheat (Triticum aestivum), 20 genes were identified, some with multiple splice variants, corresponding to a total of 24 proteins for those with observed transcription and translation. These proteins were classified as AOX1a, AOX1c, AOX1e or AOX1d via phylogenetic analysis. Proteins lacking most or all signature AOX motifs were assigned to putative regulatory roles. Analysis of protein-targeting sequences suggests mixed localization to the mitochondria and other organelles. In comparison to the most studied AOX from Trypanosoma brucei, there were amino acid substitutions at critical functional domains indicating possible role divergence in wheat or grasses in general. In hexaploid wheat, AOX genes were expressed at specific developmental stages as well as in response to both biotic and abiotic stresses such as fungal pathogens, heat and drought. These AOX expression patterns suggest a highly regulated and diverse transcription and expression system. The insights gained provide a framework for the continued and expanded study of AOX genes in wheat for stress tolerance through breeding new varieties, as well as resistance to AOX-targeted herbicides, all of which can ultimately be used synergistically to improve crop yield.


September 22, 2019

PacBio sequencing of gene families – a case study with wheat gluten genes.

Amino acids in wheat (Triticum aestivum) seeds mainly accumulate in storage proteins called gliadins and glutenins. Gliadins contain a/ß-, ?- and ?-types whereas glutenins contain HMW- and LMW-types. Known gliadin and glutenin sequences were largely determined through cloning and sequencing by capillary electrophoresis. This time-consuming process prevents us to intensively study the variation of each orthologous gene copy among cultivars. The throughput and sequencing length of Pacific Bioscience RS (PacBio) single molecule sequencing platform make it feasible to construct contiguous and non-chimeric RNA sequences. We assembled 424 wheat storage protein transcripts from ten wheat cultivars by using just one single-molecule-real-time cell. The protein genes from wheat cultivar Chinese Spring are comparable to known sequences from NCBI. We demonstrated real-time sequencing of gene families with high-throughput and low-cost. This method can be applied to studies of gene amplification and copy number variation among species and cultivars. © 2013 Elsevier B.V. All rights reserved.


September 22, 2019

A high-quality annotated transcriptome of swine peripheral blood.

High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly.A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5′ termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5′ and 3′ termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection.We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome.


September 22, 2019

Isoform sequencing provides a more comprehensive view of the Panax ginseng transcriptome.

Korean ginseng (Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng, we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana. Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng. In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.


September 22, 2019

A comprehensive analysis of alternative splicing in paleopolyploid maize.

Identifying and characterizing alternative splicing (AS) enables our understanding of the biological role of transcript isoform diversity. This study describes the use of publicly available RNA-Seq data to identify and characterize the global diversity of AS isoforms in maize using the inbred lines B73 and Mo17, and a related species, sorghum. Identification and characterization of AS within maize tissues revealed that genes expressed in seed exhibit the largest differential AS relative to other tissues examined. Additionally, differences in AS between the two genotypes B73 and Mo17 are greatest within genes expressed in seed. We demonstrate that changes in the level of alternatively spliced transcripts (intron retention and exon skipping) do not solely reflect differences in total transcript abundance, and we present evidence that intron retention may act to fine-tune gene expression across seed development stages. Furthermore, we have identified temperature sensitive AS in maize and demonstrate that drought-induced changes in AS involve distinct sets of genes in reproductive and vegetative tissues. Examining our identified AS isoforms within B73 × Mo17 recombinant inbred lines (RILs) identified splicing QTL (sQTL). The 43.3% of cis-sQTL regulated junctions are actually identified as alternatively spliced junctions in our analysis, while 10 Mb windows on each side of 48.2% of trans-sQTLs overlap with splicing related genes. Using sorghum as an out-group enabled direct examination of loss or conservation of AS between homeologous genes representing the two subgenomes of maize. We identify several instances where AS isoforms that are conserved between one maize homeolog and its sorghum ortholog are absent from the second maize homeolog, suggesting that these AS isoforms may have been lost after the maize whole genome duplication event. This comprehensive analysis provides new insights into the complexity of AS in maize.


September 22, 2019

Biogas production from hydrothermal liquefaction wastewater (HTLWW): Focusing on the microbial communities as revealed by high-throughput sequencing of full-length 16S rRNA genes.

Hydrothermal liquefaction (HTL) is an emerging and promising technology for the conversion of wet biomass into bio-crude, however, little attention has been paid to the utilization of hydrothermal liquefaction wastewater (HTLWW) with high concentration of organics. The present study investigated biogas production from wastewater obtained from HTL of straw for bio-crude production, with focuses on the analysis of the microbial communities and characterization of the organics. Batch experiments showed the methane yield of HTLWW (R-HTLWW) was 184 mL/g COD, while HTLWW after petroleum ether extraction (PE-HTLWW), to extract additional bio-crude, had higher methane yield (235 mL/g COD) due to the extraction of recalcitrant organic compounds. Sequential batch experiments further demonstrated the higher methane yield of PE-HTLWW. LC-TOF-MS, HPLC and gel filtration chromatography showed organics with molecular weight (MW) < 1000 were well degraded. Results from the high-throughput sequencing of full-length 16S rRNA genes analysis showed similar microbial community compositions were obtained for the reactors fed with either R-HTLWW or PE-HTLWW. The degradation of fatty acids were related with Mesotoga infera, Syntrophomonas wolfei et al. by species level identification. However, the species related to the degradation of other compounds (e.g. phenols) were not found, which could be due to the presence of uncharacterized microorganisms. It was also found previously proposed criteria (97% and 98.65% similarity) for species identification of 16S rRNA genes were not suitable for a fraction of 16S rRNA genes. Copyright © 2016 Elsevier Ltd. All rights reserved.


September 22, 2019

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction.

We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.


September 22, 2019

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads.

Obtaining complete gene structures is one major goal of genome assembly. Some gene regions are fragmented in low quality and high-quality assemblies. Therefore, new approaches are needed to recover gene regions. Genomes are widely transcribed, generating messenger and non-coding RNAs. These widespread transcripts can be used to scaffold genomes and complete transcribed regions.We present P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs. Tested in various species, P_RNA_scaffolder exhibited higher speed and efficiency than the existing state-of-the-art scaffolders. This tool also improved the contiguity of genome assemblies generated by current mate-pair scaffolding and third-generation single-molecule sequencing assembly.The P_RNA_scaffolder can improve the contiguity of genome assembly and benefit gene prediction. This tool is available at http://www.fishbrowser.org/software/P_RNA_scaffolder .


September 22, 2019

Single-Molecule Long-Read Sequencing of Zanthoxylum bungeanum Maxim. Transcriptome: Identification of Aroma-Related Genes

Zanthoxylum bungeanum Maxim. is an economically important tree species that is resistant to drought and infertility, and has potential medicinal and edible value. However, comprehensive genomic data are not yet available for this species, limiting its potential utility for medicinal use, breeding programs, and cultivation. Transcriptome sequencing provides an effective approach to remedying this shortcoming. Herein, single-molecule long-read sequencing and next-generation sequencingapproacheswereusedinparalleltoobtaintranscriptisoformstructureandgenefunctional informationinZ.bungeanum. Intotal, 282,101readsofinserts(ROIs)wereidentified, including134,074 full-length non-chimeric reads, among which 65,711 open reading frames (ORFs), 50,135 simple sequence repeats (SSRs), and 1492 long non-coding RNAs (lncRNAs) were detected. Functional annotation revealed metabolic pathways related to aroma components and color characteristics in Z. bungeanum. Unexpectedly, 30 transcripts were annotated as genes involved in regulating the pathogenesis of breast and colorectal cancers. This work provides a comprehensive transcriptome resource for Z. bungeanum, and lays a foundation for the further investigation and utilization of Zanthoxylum resources.


September 22, 2019

Full-length transcriptome survey and expression analysis of Cassia obtusifolia to discover putative genes related to aurantio-obtusin biosynthesis, seed formation and development, and stress response.

The seed is the pharmaceutical and breeding organ of Cassia obtusifolia, a well-known medical herb containing aurantio-obtusin (a kind of anthraquinone), food, and landscape. In order to understand the molecular mechanism of the biosynthesis of aurantio-obtusin, seed formation and development, and stress response of C. obtusifolia, it is necessary to understand the genomics information. Although previous seed transcriptome of C. obtusifolia has been carried out by short-read next-generation sequencing (NGS) technology, the vast majority of the resulting unigenes did not represent full-length cDNA sequences and supply enough gene expression profile information of the various organs or tissues. In this study, fifteen cDNA libraries, which were constructed from the seed, root, stem, leaf, and flower (three repetitions with each organ) of C. obtusifolia, were sequenced using hybrid approach combining single-molecule real-time (SMRT) and NGS platform. More than 4,315,774 long reads with 9.66 Gb sequencing data and 361,427,021 short reads with 108.13 Gb sequencing data were generated by SMRT and NGS platform, respectively. 67,222 consensus isoforms were clustered from the reads and 81.73% (61,016) of which were longer than 1000 bp. Furthermore, the 67,222 consensus isoforms represented 58,106 nonredundant transcripts, 98.25% (57,092) of which were annotated and 25,573 of which were assigned to specific metabolic pathways by KEGG. CoDXS and CoDXR genes were directly used for functional characterization to validate the accuracy of sequences obtained from transcriptome. A total of 658 seed-specific transcripts indicated their special roles in physiological processes in seed. Analysis of transcripts which were involved in the early stage of anthraquinone biosynthesis suggested that the aurantio-obtusin in C. obtusifolia was mainly generated from isochorismate and Mevalonate/methylerythritol phosphate (MVA/MEP) pathway, and three reactions catalyzed by Menaquinone-specific isochorismate synthase (ICS), 1-deoxy-d-xylulose-5-phosphate synthase (DXS) and isopentenyl diphosphate (IPPS) might be the limited steps. Several seed-specific CYPs, SAM-dependent methyltransferase, and UDP-glycosyltransferase (UDPG) supplied promising candidate genes in the late stage of anthraquinone biosynthesis. In addition, four seed-specific transcriptional factors including three MYB Transcription Factor (MYB) and one MADS-box Transcription Factor (MADS) transcriptional factors) and alternative splicing might be involved with seed formation and development. Meanwhile, most members of Hsp20 genes showed high expression level in seed and flower; seven of which might have chaperon activities under various abiotic stresses. Finally, the expressional patterns of genes with particular interests showed similar trends in both transcriptome assay and qRT-PCR. In conclusion, this is the first full-length transcriptome sequencing reported in Caesalpiniaceae family, and thus providing a more complete insight into aurantio-obtusin biosynthesis, seed formation and development, and stress response as well in C. obtusifolia.


September 22, 2019

CRISPR/Cas9 deletions in a conserved exon of Distal-less generates gains and losses in a recently acquired morphological novelty in flies.

Distal-less has been repeatedly co-opted for the development of many novel traits. Here, we document its curious role in the development of a novel abdominal appendage (“sternite brushes”) in sepsid flies. CRISPR/Cas9 deletions in the homeodomain result in losses of sternite brushes, demonstrating that Distal-less is necessary for their development. However, deletions in the upstream coding exon (Exon 2) produce losses or gains of brushes. A dissection of Exon 2 reveals that the likely mechanism for gains involves a deletion in an exon-splicing enhancer site that leads to exon skipping. Such contradictory phenotypes are also observed in butterflies, suggesting that mutations in the conserved upstream regions have the potential to generate phenotypic variability in insects that diverged 300 million years ago. Our results demonstrate the importance of Distal-less for the development of a novel abdominal appendage in insects and highlight how site-specific mutations in the same exon can produce contradictory phenotypes. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.


September 22, 2019

The habu genome reveals accelerated evolution of venom protein genes.

Evolution of novel traits is a challenging subject in biological research. Several snake lineages developed elaborate venom systems to deliver complex protein mixtures for prey capture. To understand mechanisms involved in snake venom evolution, we decoded here the ~1.4-Gb genome of a habu, Protobothrops flavoviridis. We identified 60 snake venom protein genes (SV) and 224 non-venom paralogs (NV), belonging to 18 gene families. Molecular phylogeny reveals early divergence of SV and NV genes, suggesting that one of the four copies generated through two rounds of whole-genome duplication was modified for use as a toxin. Among them, both SV and NV genes in four major components were extensively duplicated after their diversification, but accelerated evolution is evident exclusively in the SV genes. Both venom-related SV and NV genes are significantly enriched in microchromosomes. The present study thus provides a genetic background for evolution of snake venom composition.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.