Menu
September 22, 2019

Transcription-associated mutation promotes RNA complexity in highly expressed genes – A major new source of selectable variation.

Alternatively spliced transcript isoforms are thought to play a critical role for functional diversity. However, the mechanism generating the enormous diversity of spliced transcript isoforms remains unknown, and its biological significance remains unclear. We analyzed transcriptomes in saker falcons, chickens, and mice to show that alternative splicing occurs more frequently, yielding more isoforms, in highly expressed genes. We focused on hemoglobin in the falcon, the most abundantly expressed genes in blood, finding that alternative splicing produces 10-fold more isoforms than expected from the number of splice junctions in the genome. These isoforms were produced mainly by alternative use of de novo splice sites generated by transcription-associated mutation (TAM), not by the RNA editing mechanism normally invoked. We found that high expression of globin genes increases mutation frequencies during transcription, especially on nontranscribed DNA strands. After DNA replication, transcribed strands inherit these somatic mutations, creating de novo splice sites, and generating multiple distinct isoforms in the cell clone. Bisulfate sequencing revealed that DNA methylation may counteract this process by suppressing TAM, suggesting DNA methylation can spatially regulate RNA complexity. RNA profiling showed that falcons living on the high Qinghai-Tibetan Plateau possess greater global gene expression levels and higher diversity of mean to high abundance isoforms (reads per kilobases per million mapped reads?=18) than their low-altitude counterparts, and we speculate that this may enhance their oxygen transport capacity under low-oxygen environments. Thus, TAM-induced RNA diversity may be physiologically significant, providing an alternative strategy in lifestyle evolution.


September 22, 2019

Single-molecule long-read sequencing facilitates shrimp transcriptome research.

Although shrimp are of great economic importance, few full-length shrimp transcriptomes are available. Here, we used Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology to generate transcripts from the Pacific white shrimp (Litopenaeus vannamei). We obtained 322,600 full-length non-chimeric reads, from which we generated 51,367 high-quality unique full-length transcripts. We corrected errors in the SMRT sequences by comparison with Illumina-produced short reads. We successfully annotated 81.72% of all unique SMRT transcripts against the NCBI non-redundant database, 58.63% against Swiss-Prot, 45.38% against Gene Ontology, 32.57% against Clusters of Orthologous Groups of proteins (COG), and 47.83% against Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Across all transcripts, we identified 3,958 long non-coding RNAs (lncRNAs) and 80,650 simple sequence repeats (SSRs). Our study provides a rich set of full-length cDNA sequences for L. vannamei, which will greatly facilitate shrimp transcriptome research.


September 22, 2019

Long-read transcriptome data for improved gene prediction in Lentinula edodes

Lentinula edodes is one of the most popular edible mushrooms in the world and contains useful medicinal components such as lentinan. The whole-genome sequence of L. edodes has been determined with the objective of discovering candidate genes associated with agronomic traits, but experimental verification of gene models with correction of gene prediction errors is lacking. To improve the accuracy of gene prediction, we produced 12.6 Gb of long-read transcriptome data of variable lengths using PacBio single-molecule real-time (SMRT) sequencing and generated 36,946 transcript clusters with an average length of 2.2 kb. Evidence-driven gene prediction on the basis of long- and short-read RNA sequencing data was performed; a total of 16,610 protein-coding genes were predicted with error correction. Of the predicted genes, 42.2% were verified to be covered by full-length transcript clusters. The raw reads have been deposited in the NCBI SRA database under accession number PRJNA396788.


September 22, 2019

Bacterial microbiota composition of fermented fruit and vegetable juices (jiaosu) analyzed by single-molecule, real-time (SMRT) sequencing

Commercially manufactured ‘jiaosu’ (fermented fruit and vegetable juices) have gained popularity in Asia recently. Like other fermented products, they have a high microbial diversity and richness. However, no published study has yet described their microbiota composition. Thus, this work aimed to obtain the full-length 16S rRNA profiles of jiaosu using the PacBio single-molecule, real-time sequencing technology. We described the bacterial microbiota of three jiaosu products purchased from Taiwan and Japan. Bacterial sequences from all three samples distributed across seven different phyla, mainly Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes. Forty-three genera were identified (e.g. Ochrobactrum, Lactobacillus, Mycobacterium, and Acinetobacter). Fifty- five species were identified (e.g. Ochrobactrum lupini, Mycobacterium abscessus, Acinetobacter john- sonii, Lactobacillus paracasei, Lactobacillus delbrueckii, and Petrobacter succinatimandens). No patho- gen sequences were identified within the entire dataset. Moreover, only a low proportion of sequences represented common skin microflora and the food hygiene indicator Escherichia/ Shigella, suggesting overall acceptable sanitary conditions during the manufacturing process.


September 22, 2019

Multiplatform next-generation sequencing identifies novel RNA molecules and transcript isoforms of the endogenous retrovirus isolated from cultured cells.

In this study, we applied short- and long-read RNA sequencing techniques, as well as PCR analysis to investigate the transcriptome of the porcine endogenous retrovirus (PERV) expressed from cultured porcine kidney cell line PK-15. This analysis has revealed six novel transcripts and eight transcript isoforms, including five length and three splice variants. We were able to establish whether a deletion in a transcript is the result of the splicing of mRNAs or of genomic deletion in one of the PERV clones. Additionally, we re-annotated the formerly identified RNA molecules. Our analysis revealed a higher complexity of PERV transcriptome than it was earlier believed.© FEMS 2018. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


September 22, 2019

Bacterial microbiota of Kazakhstan cheese revealed by single molecule real time (SMRT) sequencing and its comparison with Belgian, Kalmykian and Italian artisanal cheeses

In Kazakhstan, traditional artisanal cheeses have a long history and are widely consumed. The unique characteristics of local artisanal cheeses are almost completely preserved. However, their microbial communities have rarely been reported. The current study firstly generated the Single Molecule, Real-Time (SMRT) sequencing bacterial diversity profiles of 6 traditional artisanal cheese samples of Kazakhstan origin, followed by comparatively analyzed the microbiota composition between the current dataset and those from cheeses originated from Belgium, Russian Republic of Kalmykia (Kalmykia) and Italy.


September 22, 2019

A survey of the sorghum transcriptome using single-molecule long reads.

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ~11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.


September 22, 2019

Long read reference genome-free reconstruction of a full-length transcriptome from Astragalus membranaceus reveals transcript variants involved in bioactive compound biosynthesis.

Astragalus membranaceus, also known as Huangqi in China, is one of the most widely used medicinal herbs in Traditional Chinese Medicine. Traditional Chinese Medicine formulations from Astragalus membranaceus have been used to treat a wide range of illnesses, such as cardiovascular disease, type 2 diabetes, nephritis and cancers. Pharmacological studies have shown that immunomodulating, anti-hyperglycemic, anti-inflammatory, antioxidant and antiviral activities exist in the extract of Astragalus membranaceus. Therefore, characterising the biosynthesis of bioactive compounds in Astragalus membranaceus, such as Astragalosides, Calycosin and Calycosin-7-O-ß-d-glucoside, is of particular importance for further genetic studies of Astragalus membranaceus. In this study, we reconstructed the Astragalus membranaceus full-length transcriptomes from leaf and root tissues using PacBio Iso-Seq long reads. We identified 27 975 and 22 343 full-length unique transcript models in each tissue respectively. Compared with previous studies that used short read sequencing, our reconstructed transcripts are longer, and are more likely to be full-length and include numerous transcript variants. Moreover, we also re-characterised and identified potential transcript variants of genes involved in Astragalosides, Calycosin and Calycosin-7-O-ß-d-glucoside biosynthesis. In conclusion, our study provides a practical pipeline to characterise the full-length transcriptome for species without a reference genome and a useful genomic resource for exploring the biosynthesis of active compounds in Astragalus membranaceus.


September 22, 2019

Isoform evolution in primates through independent combination of alternative RNA processing events.

Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875?bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


September 22, 2019

Assessment of an organ-specific de novo transcriptome of the nematode trap-crop, Solanum sisymbriifolium

Solanum sisymbriifolium, also known as “Litchi Tomato” or “Sticky Nightshade,” is an undomesticated and poorly researched plant related to potato and tomato. Unlike the latter species, S. sisymbriifolium induces eggs of the cyst nematode, Globodera pallida, to hatch and migrate into its roots, but then arrests further nematode maturation. In order to provide researchers with a partial blueprint of its genetic make-up so that the mechanism of this response might be identified, we used single molecule real time (SMRT) sequencing to compile a high quality de novo transcriptome of 41,189 unigenes drawn from individually sequenced bud, root, stem, and leaf RNA populations. Functional annotation and BUSCO analysis showed that this transcriptome was surprisingly complete, even though it represented genes expressed at a single time point. By sequencing the 4 organ libraries separately, we found we could get a reliable snapshot of transcript distributions in each organ. A divergent site analysis of the merged transcriptome indicated that this species might have undergone a recent genome duplication and re-diploidization. Further analysis indicated that the plant then retained a disproportionate number of genes associated with photosynthesis and amino acid metabolism in comparison to genes with characteristics of R-proteins or involved in secondary metabolism. The former processes may have given S. sisymbriifolium a bigger competitive advantage than the latter did. Copyright © 2018 Wixom et al.


September 22, 2019

Prey range and genome evolution of Halobacteriovorax marinus predatory bacteria from an estuary

Halobacteriovorax strains are saltwater-adapted predatory bacteria that attack Gram-negative bacteria and may play an important role in shaping microbial communities. To understand how Halobacteriovorax strains impact ecosystems and develop them as biocontrol agents, it is important to characterize variation in predation phenotypes and investigate Halobacteriovorax genome evolution. We isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island using Vibrio from the same site as prey. Small, fast-moving, attack-phase BE01 cells attach to and invade prey cells, consistent with the intraperiplasmic predation strategy of the H. marinus type strain, SJ. BE01 is a prey generalist, forming plaques on Vibrio strains from the estuary, Pseudomonas from soil, and Escherichia coli. Genome analysis revealed extremely high conservation of gene order and amino acid sequences between BE01 and SJ, suggesting strong selective pressure to maintain the genome in this H. marinus lineage. Despite this, we identified two regions of gene content difference that likely resulted from horizontal gene transfer. Analysis of modal codon usage frequencies supports the hypothesis that these regions were acquired from bacteria with different codon usage biases than H. marinus. In one of these regions, BE01 and SJ carry different genes associated with mobile genetic elements. Acquired functions in BE01 include the dnd operon, which encodes a pathway for DNA modification, and a suite of genes involved in membrane synthesis and regulation of gene expression that was likely acquired from another Halobacteriovorax lineage. This analysis provides further evidence that horizontal gene transfer plays an important role in genome evolution in predatory bacteria. IMPORTANCE Predatory bacteria attack and digest other bacteria and therefore may play a role in shaping microbial communities. To investigate phenotypic and genotypic variation in saltwater-adapted predatory bacteria, we isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island, assayed whether it could attack different prey bacteria, and sequenced and analyzed its genome. We found that BE01 is a prey generalist, attacking bacteria from different phylogenetic groups and environments. Gene order and amino acid sequences are highly conserved between BE01 and the H. marinus type strain, SJ. By comparative genomics, we detected two regions of gene content difference that likely occurred via horizontal gene transfer events. Acquired genes encode functions such as modification of DNA, membrane synthesis and regulation of gene expression. Understanding genome evolution and variation in predation phenotypes among predatory bacteria will inform their development as biocontrol agents and clarify how they impact microbial communities.


September 22, 2019

Genome sequence of the Japanese oak silk moth, Antheraea yamamai: the first draft genome in the family Saturniidae.

Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, shows different characteristics such as thickness, compressive elasticity, and chemical resistance compared with common silk produced from the domesticated silkworm, Bombyx mori. Its unique characteristics have led to its use in many research fields including biotechnology and medical science, and the scientific as well as economic importance of the wild silk moth continues to gradually increase. However, no genomic information for the wild silk moth, including A. yamamai, is currently available.In order to construct the A. yamamai genome, a total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated, providing 210-fold coverage based on the 700-Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb (>2 kb) with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. Identified repeat elements covered 37.33% of the total genome, and the completeness of the constructed genome assembly was estimated to be 96.7% by Benchmarking Universal Single-Copy Orthologs v2 analysis. A total of 15 481 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq-based, known-gene-based) and manual curation.Here we present the genome sequence of A. yamamai, the first genome sequence of the wild silk moth. These results provide valuable genomic information, which will help enrich our understanding of the molecular mechanisms relating to not only specific phenotypes such as wild silk itself but also the genomic evolution of Saturniidae.© The Authors 2017. Published by Oxford University Press.


September 22, 2019

Extreme haplotype variation in the desiccation-tolerant clubmoss Selaginella lepidophylla.

Plant genome size varies by four orders of magnitude, and most of this variation stems from dynamic changes in repetitive DNA content. Here we report the small 109?Mb genome of Selaginella lepidophylla, a clubmoss with extreme desiccation tolerance. Single-molecule sequencing enables accurate haplotype assembly of a single heterozygous S. lepidophylla plant, revealing extensive structural variation. We observe numerous haplotype-specific deletions consisting of largely repetitive and heavily methylated sequences, with enrichment in young Gypsy LTR retrotransposons. Such elements are active but rapidly deleted, suggesting “bloat and purge” to maintain a small genome size. Unlike all other land plant lineages, Selaginella has no evidence of a whole-genome duplication event in its evolutionary history, but instead shows unique tandem gene duplication patterns reflecting adaptation to extreme drying. Gene expression changes during desiccation in S. lepidophylla mirror patterns observed across angiosperm resurrection plants.


September 22, 2019

Egg case silk gene sequences from Argiope spiders: Evidence for multiple loci and a loss of function between paralogs.

Spiders swath their eggs with silk to protect developing embryos and hatchlings. Egg case silks, like other fibrous spider silks, are primarily composed of proteins called spidroins (spidroin = spider-fibroin). Silks, and thus spidroins, are important throughout the lives of spiders, yet the evolution of spidroin genes has been relatively understudied. Spidroin genes are notoriously difficult to sequence because they are typically very long (= 10 kb of coding sequence) and highly repetitive. Here, we investigate the evolution of spider silk genes through long-read sequencing of Bacterial Artificial Chromosome (BAC) clones. We demonstrate that the silver garden spiderArgiope argentatahas multiple egg case spidroin loci with a loss of function at one locus. We also use degenerate PCR primers to search the genomic DNA of congeneric species and find evidence for multiple egg case spidroin loci in otherArgiopespiders. Comparative analyses show that these multiple loci are more similar at the nucleotide level within a species than between species. This pattern is consistent with concerted evolution homogenizing gene copies within a genome. More complicated explanations include convergent evolution or recent independent gene duplications within each species. Copyright © 2018 Chaw et al.


September 22, 2019

Reference assembly and annotation of the Pyrenophora teres f. teres isolate 0-1.

Pyrenophora teres f.teres, the causal agent of net form net blotch (NFNB) of barley, is a destructive pathogen in barley-growing regions throughout the world. Typical yield losses due to NFNB range from 10 to 40%; however, complete loss has been observed on highly susceptible barley lines where environmental conditions favor the pathogen. Currently, genomic resources for this economically important pathogen are limited to a fragmented draft genome assembly and annotation, with limited RNA support of theP. teresf.teresisolate 0-1. This research presents an updated 0-1 reference assembly facilitated by long-read sequencing and scaffolding with the assistance of genetic linkage maps. Additionally, genome annotation was mediated by RNAseq analysis using three infection time points and a pure culture sample, resulting in 11,541 high-confidence gene models. The 0-1 genome assembly and annotation presented here now contains the majority of the repetitive content of the genome. Analysis of the 0-1 genome revealed classic characteristics of a “two-speed” genome, being compartmentalized into GC-equilibrated and AT-rich compartments. The assembly of repetitive AT-rich regions will be important for future investigation of genes known as effectors, which often reside in close proximity to repetitive regions. These effectors are responsible for manipulation of the host defense during infection. This updatedP. teresf.teresisolate 0-1 reference genome assembly and annotation provides a robust resource for the examination of the barley-P. teresf.tereshost-pathogen coevolution. Copyright © 2018 Wyatt et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.