Menu
September 22, 2019

Full-length transcriptome sequences and the identification of putative genes for flavonoid biosynthesis in safflower.

The flower of the safflower (Carthamus tinctorius L.) has been widely used in traditional Chinese medicine for the ability to improve cerebral blood flow. Flavonoids are the primary bioactive components in safflower, and their biosynthesis has attracted widespread interest. Previous studies mostly used second-generation sequencing platforms to survey the putative flavonoid biosynthesis genes. For a better understanding of transcription data and the putative genes involved in flavonoid biosynthesis in safflower, we carry our study.High-quality RNA was extracted from six types of safflower tissue. The RNAs of different tissues were mixed equally and used for multiple size-fractionated libraries (1-2, 2-3 and 3-6 k) library construction. Five cells were carried (2 cells for 1-2 and for 2-3 k libraries and 1 cell for 3-6 k libraries). 10.43Gb clean data and 38,302 de-redundant sequences were captured. 44 unique isoforms were annotated as encoding enzymes involved in flavonoid biosynthesis. The full length flavonoid genes were characterized and their evolutional relationship and expressional pattern were analyzed. They can be divided into eight families, with a large differences in the tissue expression. The temporal expressions under MeJA treatment were also measured, 9 genes are significantly up-regulated and 2 genes are significantly down-regulated. The genes involved in flavonoid synthesis in safflower were predicted in our study. Besides, the SSR and lncRNA are also analyzed in our study.Full-length transcriptome sequences were used in our study. The genes involved in flavonoid synthesis in safflower were predicted in our study. Combined the determination of flavonoids, CtC4H2, CtCHS3, CtCHI3, CtF3H3, CtF3H1 are mainly participated in MeJA promoting the synthesis of flavonoids. Our results also provide a valuable resource for further study on safflower.


September 22, 2019

Transcriptome-wide survey of pseudorabies virus using next- and third-generation sequencing platforms.

Pseudorabies virus (PRV) is an alphaherpesvirus of swine. PRV has a large double-stranded DNA genome and, as the latest investigations have revealed, a very complex transcriptome. Here, we present a large RNA-Seq dataset, derived from both short- and long-read sequencing. The dataset contains 1.3 million 100?bp paired-end reads that were obtained from the Illumina random-primed libraries, as well as 10 million 50?bp single-end reads generated by the Illumina polyA-seq. The Pacific Biosciences RSII non-amplified method yielded 57,021 reads of inserts (ROIs) aligned to the viral genome, the amplified method resulted in 158,396 PRV-specific ROIs, while we obtained 12,555 ROIs using the Sequel platform. The Oxford Nanopore’s MinION device generated 44,006 reads using their regular cDNA-sequencing method, whereas 29,832 and 120,394 reads were produced by using the direct RNA-sequencing and the Cap-selection protocols, respectively. The raw reads were aligned to the PRV reference genome (KJ717942.1). Our provided dataset can be used to compare different sequencing approaches, library preparation methods, as well as for validation and testing bioinformatic pipelines.


September 22, 2019

De novo assembly and characterizing of the culm-derived meta-transcriptome from the polyploid sugarcane genome based on coding transcripts

Sugarcane biomass has been used for sugar, bioenergy and biomaterial production. The majority of the sugarcane biomass comes from the culm, which makes it important to understand the genetic control of biomass production in this part of the plant. A meta-transcriptome of the culm was obtained in an earlier study by using about one billion paired-end (150 bp) reads of deep RNA sequencing of samples from 20 diverse sugarcane genotypes and combining de novo assemblies from different assemblers and different settings. Although many genes could be recovered, this resulted in a large combined assembly which created the need for clustering to reduce transcript redundancy while maintaining gene content. Here, we present a comprehensive analysis of the effect of different assembly settings and clustering methods on de novo assembly, annotation and transcript profiling focusing especially on the coding transcripts from the highly polyploid sugarcane genome. The new coding sequence-based transcript clustering resulted in a better representation of transcripts compared to the earlier approach, having 121,987 contigs, which included 78,052 main and 43,935 alternative transcripts. About 73%, 67%, 61% and 10% of the transcriptome was annotated against the NCBI NR protein database, GO terms, orthologous groups and KEGG orthologies, respectively. Using this set for a differential gene expression analysis between the young and mature sugarcane culm tissues, a total of 822 transcripts were found to be differentially expressed, including key transcripts involved in sugar/fiber accumulation in sugarcane. In the context of the lack of a whole genome sequence for sugarcane, the availability of a well annotated culm-derived meta-transcriptome through deep sequencing provides useful information on coding genes specific to the sugarcane culm and will certainly contribute to understanding the process of carbon partitioning, and biomass accumulation in the sugarcane culm.


September 22, 2019

Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform.

Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.


September 22, 2019

High-quality RNA isolation from wheat immature grains

Grain quality is one of the most important targets in wheat breeding. Transcriptome analyses of wheat developing grains and endosperm have been performed using the microarray and RNA sequencing (RNA-seq) approaches (Wan et al. 2008, 2009; Nemeth et al. 2010; Pellny et al. 2012; Dong et al. 2015). For the RNA-seq analysis of the grain transcriptome and precise quantification of each transcript in developing grain and endosperm, the high-quality RNA is essential. For the microarray analysis, =7.3 RIN (RNA integrity number) value for the RNA sample quality is required according to the Agilent microarray protocol. In the previous report for the transcriptome of wheat developing grains, the total RNA samples with =8.0 RIN values were used for the RNA-seq analysis based on the PacBio and Illumina platforms (Dong et al. 2015). Some RNA extraction buffers containing SDS, CTAB, or TRIzol® reagent (Thermo Fisher Scientific, Waltham, Massachusetts) and several commercial kits for RNA isolation have been used to isolate total RNA from wheat grain and endosperm (Kawakami et al. 1992; Wan et al. 2008; Kang et al. 2013). However, total RNA samples from the wheat developing and immature grains are often damaged due to high content of polysaccharides and high stickiness of the solution homogenized with the RNA extraction buffer, and thus extraction of the high-quality RNA with high RIN value is quite difficult. Here, we report a protocol for the wheat grain RNA extraction using Maxwell RSC Plant RNA Kit (Promega, Madison, Wisconsin).


September 22, 2019

Comparative transcriptomic and physiological analyses of Medicago sativa L. indicates that multiple regulatory networks are activated during continuous ABA treatment.

Alfalfa is the most extensively cultivated forage legume worldwide. However, the molecular mechanisms underlying alfalfa responses to exogenous abscisic acid (ABA) are still unknown. In this study, the first global transcriptome profiles of alfalfa roots under ABA treatments for 1, 3 and 12 h (three biological replicates for each time point, including the control group) were constructed using a BGISEQ-500 sequencing platform. A total of 50,742 isoforms with a mean length of 2541 bp were generated, and 4944 differentially expressed isoforms (DEIs) were identified after ABA deposition. Metabolic analyses revealed that these DEIs were involved in plant hormone signal transduction, transcriptional regulation, antioxidative defense and pathogen immunity. Notably, several well characterized hormone signaling pathways, for example, the core ABA signaling pathway, was activated, while salicylic acid, jasmonate and ethylene signaling pathways were mainly suppressed by exogenous ABA. Moreover, the physiological work showed that catalase and peroxidase activity and glutathione and proline content were increased after ABA deposition, which is in accordance with the dynamic transcript profiles of the relevant genes in antioxidative defense system. These results indicate that ABA has the potential to improve abiotic stress tolerance, but that it may negatively regulate pathogen resistance in alfalfa.


September 22, 2019

Characterization of the dynamic transcriptome of a herpesvirus with long-read Single Molecule Real-Time Sequencing.

Herpesvirus gene expression is co-ordinately regulated and sequentially ordered during productive infection. The viral genes can be classified into three distinct kinetic groups: immediate-early, early, and late classes. In this study, a massively parallel sequencing technique that is based on PacBio Single Molecule Real-time sequencing platform, was used for quantifying the poly(A) fraction of the lytic transcriptome of pseudorabies virus (PRV) throughout a 12-hour interval of productive infection on PK-15 cells. Other approaches, including microarray, real-time RT-PCR and Illumina sequencing are capable of detecting only the aggregate transcriptional activity of particular genomic regions, but not individual herpesvirus transcripts. However, SMRT sequencing allows for a distinction between transcript isoforms, including length- and splice variants, as well as between overlapping polycistronic RNA molecules. The non-amplified Isoform Sequencing (Iso-Seq) method was used to analyse the kinetic properties of the lytic PRV transcripts and to then classify them accordingly. Additionally, the present study demonstrates the general utility of long-read sequencing for the time-course analysis of global gene expression in practically any organism.


September 22, 2019

Draft genome assembly of the poultry red mite, Dermanyssus gallinae.

The poultry red mite, Dermanyssus gallinae, is a major worldwide concern in the egg-laying industry. Here, we report the first draft genome assembly and gene prediction of Dermanyssus gallinae, based on combined PacBio and MinION long-read de novo sequencing. The ~959-Mb genome is predicted to encode 14,608 protein-coding genes.


September 22, 2019

Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.

The large and complex hexaploid genome has greatly hindered genomics studies of common wheat (Triticum aestivum, AABBDD). Here, we investigated transcripts in common wheat developing caryopses using the emerging single-molecule real-time (SMRT) sequencing technology PacBio RSII, and assessed the resultant data for improving common wheat genome annotation and grain transcriptome research.We obtained 197,709 full-length non-chimeric (FLNC) reads, 74.6 % of which were estimated to carry complete open reading frame. A total of 91,881 high-quality FLNC reads were identified and mapped to 16,188 chromosomal loci, corresponding to 13,162 known genes and 3026 new genes not annotated previously. Although some FLNC reads could not be unambiguously mapped to the current draft genome sequence, many of them are likely useful for studying highly similar homoeologous or paralogous loci or for improving chromosomal contig assembly in further research. The 91,881 high-quality FLNC reads represented 22,768 unique transcripts, 9591 of which were newly discovered. We found 180 transcripts each spanning two or three previously annotated adjacent loci, suggesting that they should be merged to form correct gene models. Finally, our data facilitated the identification of 6030 genes differentially regulated during caryopsis development, and full-length transcripts for 72 transcribed gluten gene members that are important for the end-use quality control of common wheat.Our work demonstrated the value of PacBio transcript sequencing for improving common wheat genome annotation through uncovering the loci and full-length transcripts not discovered previously. The resource obtained may aid further structural genomics and grain transcriptome studies of common wheat.


September 22, 2019

Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data.

In this study, we established a general framework to use PacBio full-length transcriptome sequencing for the investigation of mitochondrial RNAs. As a result, we produced the first full-length human mitochondrial transcriptome using public PacBio data and characterized the human mitochondrial genome with more comprehensive and accurate information. Other results included determination of the H-strand primary transcript, identification of the ND5/ND6AS/tRNAGluAS transcript, discovery of palindrome small RNAs (psRNAs) and construction of the “mitochondrial cleavage” model, etc. These results reported for the first time in this study fundamentally changed annotations of human mitochondrial genome and enriched knowledge in the field of animal mitochondrial studies. The most important finding was two novel long non-coding RNAs (lncRNAs) of MDL1 and MDL1AS exist ubiquitously in animal mitochondrial genomes. Copyright © 2017. Published by Elsevier B.V.


September 22, 2019

Accurate characterization of the IFITM locus using MiSeq and PacBio sequencing shows genetic variation in Galliformes.

Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely characterized for their role in restricting infection by diverse enveloped and non-enveloped viruses. The chicken IFITM (chIFITM) genes are clustered on chromosome 5 and to date four genes have been annotated, namely chIFITM1, chIFITM3, chIFITM5 and chIFITM10. However, due to poor assembly of this locus in the Gallus Gallus v4 genome, accurate characterization has so far proven problematic. Recently, a new chicken reference genome assembly Gallus Gallus v5 was generated using Sanger, 454, Illumina and PacBio sequencing technologies identifying considerable differences in the chIFITM locus over the previous genome releases.We re-sequenced the locus using both Illumina MiSeq and PacBio RS II sequencing technologies and we mapped RNA-seq data from the European Nucleotide Archive (ENA) to this finalized chIFITM locus. Using SureSelect probes capture probes designed to the finalized chIFITM locus, we sequenced the locus of a different chicken breed, namely a White Leghorn, and a turkey.We confirmed the Gallus Gallus v5 consensus except for two insertions of 5 and 1 base pair within the chIFITM3 and B4GALNT4 genes, respectively, and a single base pair deletion within the B4GALNT4 gene. The pull down revealed a single amino acid substitution of A63V in the CIL domain of IFITM2 compared to Red Jungle fowl and 13, 13 and 11 differences between IFITM1, 2 and 3 of chickens and turkeys, respectively. RNA-seq shows chIFITM2 and chIFITM3 expression in numerous tissue types of different chicken breeds and avian cell lines, while the expression of the putative chIFITM1 is limited to the testis, caecum and ileum tissues.Locus resequencing using these capture probes and RNA-seq based expression analysis will allow the further characterization of genetic diversity within Galliformes.


September 22, 2019

The dynamic landscape of fission yeast meiosis alternative-splice isoforms.

Alternative splicing increases the diversity of transcriptomes and proteomes in metazoans. The extent to which alternative splicing is active and functional in unicellular organisms is less understood. Here, we exploit a single-molecule long-read sequencing technique and develop an open-source software program called SpliceHunter to characterize the transcriptome in the meiosis of fission yeast. We reveal 14,353 alternative splicing events in 17,669 novel isoforms at different stages of meiosis, including antisense and read-through transcripts. Intron retention is the major type of alternative splicing, followed by alternate “intron in exon.” Seven hundred seventy novel transcription units are detected; 53 of the predicted proteins show homology in other species and form theoretical stable structures. We report the complexity of alternative splicing along isoforms, including 683 intra-molecularly co-associated intron pairs. We compare the dynamics of novel isoforms based on the number of supporting full-length reads with those of annotated isoforms and explore the translational capacity and quality of novel isoforms. The evaluation of these factors indicates that the majority of novel isoforms are unlikely to be both condition-specific and translatable but consistent with the possibility of biologically functional novel isoforms. Moreover, the co-option of these unusual transcripts into newly born genes seems likely. Together, the results of this study highlight the diversity and dynamics at the isoform level in the sexual development of fission yeast. © 2017 Kuang et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Analyses of alternative polyadenylation: from old school biochemistry to high-throughput technologies.

Alternations in usage of polyadenylation sites during transcription termination yield transcript isoforms from a gene. Recent findings of transcriptome-wide alternative polyadenylation (APA) as a molecular response to changes in biology position APA not only as a molecular event of early transcriptional termination but also as a cellular regulatory step affecting various biological pathways. With the development of high-throughput profiling technologies at a single nucleotide level and their applications targeted to the 3′-end of mRNAs, dynamics in the landscape of mRNA 3′-end is measureable at a global scale. In this review, methods and technologies that have been adopted to study APA events are discussed. In addition, various bioinformatics algorithms for APA isoform analysis using publicly available RNA-seq datasets are introduced. [BMB Reports 2017; 50(4): 201-207].


September 22, 2019

The full transcription map of mouse papillomavirus type 1 (MmuPV1) in mouse wart tissues.

Mouse papillomavirus type 1 (MmuPV1) provides, for the first time, the opportunity to study infection and pathogenesis of papillomaviruses in the context of laboratory mice. In this report, we define the transcriptome of MmuPV1 genome present in papillomas arising in experimentally infected mice using a combination of RNA-seq, PacBio Iso-seq, 5′ RACE, 3′ RACE, primer-walking RT-PCR, RNase protection, Northern blot and in situ hybridization analyses. We demonstrate that the MmuPV1 genome is transcribed unidirectionally from five major promoters (P) or transcription start sites (TSS) and polyadenylates its transcripts at two major polyadenylation (pA) sites. We designate the P7503, P360 and P859 as “early” promoters because they give rise to transcripts mostly utilizing the polyadenylation signal at nt 3844 and therefore can only encode early genes, and P7107 and P533 as “late” promoters because they give rise to transcripts utilizing polyadenylation signals at either nt 3844 or nt 7047, the latter being able to encode late, capsid proteins. MmuPV1 genome contains five splice donor sites and three acceptor sites that produce thirty-six RNA isoforms deduced to express seven predicted early gene products (E6, E7, E1, E1^M1, E1^M2, E2 and E8^E2) and three predicted late gene products (E1^E4, L2 and L1). The majority of the viral early transcripts are spliced once from nt 757 to 3139, while viral late transcripts, which are predicted to encode L1, are spliced twice, first from nt 7243 to either nt 3139 (P7107) or nt 757 to 3139 (P533) and second from nt 3431 to nt 5372. Thirteen of these viral transcripts were detectable by Northern blot analysis, with the P533-derived late E1^E4 transcripts being the most abundant. The late transcripts could be detected in highly differentiated keratinocytes of MmuPV1-infected tissues as early as ten days after MmuPV1 inoculation and correlated with detection of L1 protein and viral DNA amplification. In mature warts, detection of L1 was also found in more poorly differentiated cells, as previously reported. Subclinical infections were also observed. The comprehensive transcription map of MmuPV1 generated in this study provides further evidence that MmuPV1 is similar to high-risk cutaneous beta human papillomaviruses. The knowledge revealed will facilitate the use of MmuPV1 as an animal virus model for understanding of human papillomavirus gene expression, pathogenesis and immunology.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.