Menu
September 22, 2019

Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon

A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.


September 22, 2019

Long non-coding RNA identification: comparing machine learning based tools for long non-coding transcripts discrimination

Long noncoding RNA (lncRNA) is a kind of noncoding RNA with length more than 200 nucleotides, which aroused interest of people in recent years. Lots of studies have confirmed that human genome contains many thousands of lncRNAs which exert great influence over some critical regulators of cellular process. With the advent of high-throughput sequencing technologies, a great quantity of sequences is waiting for exploitation. Thus, many programs are developed to distinguish differences between coding and long noncoding transcripts. Different programs are generally designed to be utilised under different circumstances and it is sensible and practical to select an appropriate method according to a certain situation. In this review, several popular methods and their advantages, disadvantages, and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.


September 22, 2019

Retention of seed trees fails to lifeboat ectomycorrhizal fungal diversity in harvested Scots pine forests.

Fennoscandian forestry has in the past decades changed from natural regeneration of forests towards replantation of clear-cuts, which negatively impacts ectomycorrhizal fungal (EMF) diversity. Retention of trees during harvesting enables EMF survival, and we therefore expected EMF communities to be more similar to those in old natural stands after forest regeneration using seed trees compared to full clear-cutting and replanting. We sequenced fungal internal transcribed spacer 2 (ITS2) amplicons to assess EMF communities in 10- to 60-year-old Scots pine stands regenerated either using seed trees or through replanting of clear-cuts with old natural stands as reference. We also investigated local EMF communities around retained old trees. We found that retention of seed trees failed to mitigate the impact of harvesting on EMF community composition and diversity. With increasing stand age, EMF communities became increasingly similar to those in old natural stands and permanently retained trees maintained EMF locally. From our observations, we conclude that EMF communities, at least common species, post-harvest are more influenced by environmental filtering, resulting from environmental changes induced by harvest, than by the continuity of trees. These results suggest that retention of intact forest patches is a more efficient way to conserve EMF diversity than retaining dispersed single trees.© FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


September 22, 2019

The microbiota of freshwater fish and freshwater niches contain omega-3 producing Shewanella species.

Approximately 30 years ago, it was discovered that free-living bacteria isolated from cold ocean depths could produce polyunsaturated fatty acids (PUFA) such as eicosapentaenoic acid (EPA) (20:5n-3) or docosahexaenoic acid (DHA) (22:6n-3), two PUFA essential for human health. Numerous laboratories have also discovered that EPA- and/or DHA-producing bacteria, many of them members of the Shewanella genus, could be isolated from the intestinal tracts of omega-3 fatty acid-rich marine fish. If bacteria contribute omega-3 fatty acids to the host fish in general or if they assist some bacterial species in adaptation to cold, then cold freshwater fish or habitats should also harbor these producers. Thus, we undertook a study to see if these niches also contained omega-3 fatty acid producers. We were successful in isolating and characterizing unique EPA-producing strains of Shewanella from three strictly freshwater native fish species, i.e., lake whitefish (Coregonus clupeaformis), lean lake trout (Salvelinus namaycush), and walleye (Sander vitreus), and from two other freshwater nonnative fish, i.e., coho salmon (Oncorhynchus kisutch) and seeforellen brown trout (Salmo trutta). We were also able to isolate four unique free-living strains of EPA-producing Shewanella from freshwater habitats. Phylogenetic and phenotypic analyses suggest that one producer is clearly a member of the Shewanella morhuae species and another is sister to members of the marine PUFA-producing Shewanella baltica species. However, the remaining isolates have more ambiguous relationships, sharing a common ancestor with non-PUFA-producing Shewanella putrefaciens isolates rather than marine S. baltica isolates despite having a phenotype more consistent with S. baltica strains. Copyright © 2015, American Society for Microbiology. All Rights Reserved.


September 22, 2019

Full-length transcriptome sequences and the identification of putative genes for flavonoid biosynthesis in safflower.

The flower of the safflower (Carthamus tinctorius L.) has been widely used in traditional Chinese medicine for the ability to improve cerebral blood flow. Flavonoids are the primary bioactive components in safflower, and their biosynthesis has attracted widespread interest. Previous studies mostly used second-generation sequencing platforms to survey the putative flavonoid biosynthesis genes. For a better understanding of transcription data and the putative genes involved in flavonoid biosynthesis in safflower, we carry our study.High-quality RNA was extracted from six types of safflower tissue. The RNAs of different tissues were mixed equally and used for multiple size-fractionated libraries (1-2, 2-3 and 3-6 k) library construction. Five cells were carried (2 cells for 1-2 and for 2-3 k libraries and 1 cell for 3-6 k libraries). 10.43Gb clean data and 38,302 de-redundant sequences were captured. 44 unique isoforms were annotated as encoding enzymes involved in flavonoid biosynthesis. The full length flavonoid genes were characterized and their evolutional relationship and expressional pattern were analyzed. They can be divided into eight families, with a large differences in the tissue expression. The temporal expressions under MeJA treatment were also measured, 9 genes are significantly up-regulated and 2 genes are significantly down-regulated. The genes involved in flavonoid synthesis in safflower were predicted in our study. Besides, the SSR and lncRNA are also analyzed in our study.Full-length transcriptome sequences were used in our study. The genes involved in flavonoid synthesis in safflower were predicted in our study. Combined the determination of flavonoids, CtC4H2, CtCHS3, CtCHI3, CtF3H3, CtF3H1 are mainly participated in MeJA promoting the synthesis of flavonoids. Our results also provide a valuable resource for further study on safflower.


September 22, 2019

Soil drying procedure affects the DNA quantification of Lactarius vinosus but does not change the fungal community composition.

Drying soil samples before DNA extraction is commonly used for specific fungal DNA quantification and metabarcoding studies, but the impact of different drying procedures on both the specific fungal DNA quantity and the fungal community composition has not been analyzed. We tested three different drying procedures (freeze-drying, oven-drying, and room temperature) on 12 different soil samples to determine (a) the soil mycelium biomass of the ectomycorrhizal species Lactarius vinosus using qPCR with a specifically designed TaqMan® probe and (b) the fungal community composition and diversity using the PacBio® RS II sequencing platform. Mycelium biomass of L. vinosus was significantly greater in the freeze-dried soil samples than in samples dried at oven and room temperature. However, drying procedures had no effect on fungal community composition or on fungal diversity. In addition, there were no significant differences in the proportions of fungi according to their functional roles (moulds vs. mycorrhizal species) in response to drying procedures. Only six out of 1139 operational taxonomic units (OTUs) had increased their relative proportions after soil drying at room temperature, with five of these OTUs classified as mould or yeast species. However, the magnitude of these changes was small, with an overall increase in relative abundance of these OTUs of approximately 2 %. These results suggest that DNA degradation may occur especially after drying soil samples at room temperature, but affecting equally nearly all fungi and therefore causing no significant differences in diversity and community composition. Despite the minimal effects caused by the drying procedures at the fungal community composition, freeze-drying resulted in higher concentrations of L. vinosus DNA and prevented potential colonization from opportunistic species.


September 22, 2019

Analyses of alternative polyadenylation: from old school biochemistry to high-throughput technologies.

Alternations in usage of polyadenylation sites during transcription termination yield transcript isoforms from a gene. Recent findings of transcriptome-wide alternative polyadenylation (APA) as a molecular response to changes in biology position APA not only as a molecular event of early transcriptional termination but also as a cellular regulatory step affecting various biological pathways. With the development of high-throughput profiling technologies at a single nucleotide level and their applications targeted to the 3′-end of mRNAs, dynamics in the landscape of mRNA 3′-end is measureable at a global scale. In this review, methods and technologies that have been adopted to study APA events are discussed. In addition, various bioinformatics algorithms for APA isoform analysis using publicly available RNA-seq datasets are introduced. [BMB Reports 2017; 50(4): 201-207].


September 22, 2019

The full transcription map of mouse papillomavirus type 1 (MmuPV1) in mouse wart tissues.

Mouse papillomavirus type 1 (MmuPV1) provides, for the first time, the opportunity to study infection and pathogenesis of papillomaviruses in the context of laboratory mice. In this report, we define the transcriptome of MmuPV1 genome present in papillomas arising in experimentally infected mice using a combination of RNA-seq, PacBio Iso-seq, 5′ RACE, 3′ RACE, primer-walking RT-PCR, RNase protection, Northern blot and in situ hybridization analyses. We demonstrate that the MmuPV1 genome is transcribed unidirectionally from five major promoters (P) or transcription start sites (TSS) and polyadenylates its transcripts at two major polyadenylation (pA) sites. We designate the P7503, P360 and P859 as “early” promoters because they give rise to transcripts mostly utilizing the polyadenylation signal at nt 3844 and therefore can only encode early genes, and P7107 and P533 as “late” promoters because they give rise to transcripts utilizing polyadenylation signals at either nt 3844 or nt 7047, the latter being able to encode late, capsid proteins. MmuPV1 genome contains five splice donor sites and three acceptor sites that produce thirty-six RNA isoforms deduced to express seven predicted early gene products (E6, E7, E1, E1^M1, E1^M2, E2 and E8^E2) and three predicted late gene products (E1^E4, L2 and L1). The majority of the viral early transcripts are spliced once from nt 757 to 3139, while viral late transcripts, which are predicted to encode L1, are spliced twice, first from nt 7243 to either nt 3139 (P7107) or nt 757 to 3139 (P533) and second from nt 3431 to nt 5372. Thirteen of these viral transcripts were detectable by Northern blot analysis, with the P533-derived late E1^E4 transcripts being the most abundant. The late transcripts could be detected in highly differentiated keratinocytes of MmuPV1-infected tissues as early as ten days after MmuPV1 inoculation and correlated with detection of L1 protein and viral DNA amplification. In mature warts, detection of L1 was also found in more poorly differentiated cells, as previously reported. Subclinical infections were also observed. The comprehensive transcription map of MmuPV1 generated in this study provides further evidence that MmuPV1 is similar to high-risk cutaneous beta human papillomaviruses. The knowledge revealed will facilitate the use of MmuPV1 as an animal virus model for understanding of human papillomavirus gene expression, pathogenesis and immunology.


September 22, 2019

Transcriptome profiling using Illumina- and SMRT-based RNA-seq of hot pepper for in-depth understanding of genes involved in CMV infection.

Hot pepper (Capsicum annuum L.) is becoming an increasingly important vegetable crop in the world. Cucumber mosaic virus (CMV) is a destructive virus that can cause leaf distortion and fruit lesions, affecting pepper production. However, studies on the response to CMV infection in pepper at the transcriptional level are limited. In this study, the transcript profiles of pepper leaves after CMV infection were investigated using Illumina and single-molecule real-time (SMRT) RNA-sequencing (RNA-seq). A total of 2143 differentially expressed genes (DEGs) were identified at five different stages. Gene ontology (GO) and KEGG analysis revealed that these DEGs were involved in the response to stress, defense response and plant-pathogen interaction pathways. Among these DEGs, several key genes that consistently appeared in studies of plant-pathogen interactions had increased transcript abundance after inoculation, including chitinase, pathogenesis-related (PR) protein, TMV resistance protein, WRKY transcription factor and jasmonate ZIM-domain protein. Four of these DEGs were further validated by quantitative real-time RT-PCR (qRT-PCR). Furthermore, a total of 73, 597 alternative splicing (AS) events were identified in the pepper leaves after CMV infection, distributed in 12, 615 genes. The intron retention of WRKY33 (Capana09g001251) might be involved in the regulation of CMV infection. Taken together, our study provides a transcriptome-wide insight into the molecular basis of resistance to CMV infection in pepper leaves and potential candidate genes for improving resistance cultivars. Copyright © 2018 Elsevier B.V. All rights reserved.


September 22, 2019

High-resolution phylogenetic microbial community profiling.

Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structures at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake’s water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.


September 22, 2019

Genome-wide analysis of complex wheat gliadins, the dominant carriers of celiac disease epitopes.

Gliadins, specified by six compound chromosomal loci (Gli-A1/B1/D1 and Gli-A2/B2/D2) in hexaploid bread wheat, are the dominant carriers of celiac disease (CD) epitopes. Because of their complexity, genome-wide characterization of gliadins is a strong challenge. Here, we approached this challenge by combining transcriptomic, proteomic and bioinformatic investigations. Through third-generation RNA sequencing, full-length transcripts were identified for 52 gliadin genes in the bread wheat cultivar Xiaoyan 81. Of them, 42 were active and predicted to encode 25 a-, 11 ?-, one d- and five ?-gliadins. Comparative proteomic analysis between Xiaoyan 81 and six newly-developed mutants each lacking one Gli locus indicated the accumulation of 38 gliadins in the mature grains. A novel group of a-gliadins (the CSTT group) was recognized to contain very few or no CD epitopes. The d-gliadins identified here or previously did not carry CD epitopes. Finally, the mutant lacking Gli-D2 showed significant reductions in the most celiac-toxic a-gliadins and derivative CD epitopes. The insights and resources generated here should aid further studies on gliadin functions in CD and the breeding of healthier wheat.


September 22, 2019

Single-cell RNAseq for the study of isoforms-how is that possible?

Single-cell RNAseq and alternative splicing studies have recently become two of the most prominent applications of RNAseq. However, the combination of both is still challenging, and few research efforts have been dedicated to the intersection between them. Cell-level insight on isoform expression is required to fully understand the biology of alternative splicing, but it is still an open question to what extent isoform expression analysis at the single-cell level is actually feasible. Here, we establish a set of four conditions that are required for a successful single-cell-level isoform study and evaluate how these conditions are met by these technologies in published research.


September 22, 2019

De novo assembly of a Chinese soybean genome.

Soybean was domesticated in China and has become one of the most important oilseed crops. Due to bottlenecks in their introduction and dissemination, soybeans from different geographic areas exhibit extensive genetic diversity. Asia is the largest soybean market; therefore, a high-quality soybean reference genome from this area is critical for soybean research and breeding. Here, we report the de novo assembly and sequence analysis of a Chinese soybean genome for “Zhonghuang 13” by a combination of SMRT, Hi-C and optical mapping data. The assembled genome size is 1.025 Gb with a contig N50 of 3.46 Mb and a scaffold N50 of 51.87 Mb. Comparisons between this genome and the previously reported reference genome (cv. Williams 82) uncovered more than 250,000 structure variations. A total of 52,051 protein coding genes and 36,429 transposable elements were annotated for this genome, and a gene co-expression network including 39,967 genes was also established. This high quality Chinese soybean genome and its sequence analysis will provide valuable information for soybean improvement in the future.


September 22, 2019

Recent insights into the tick microbiome gained through next-generation sequencing.

The tick microbiome comprises communities of microorganisms, including viruses, bacteria and eukaryotes, and is being elucidated through modern molecular techniques. The advent of next-generation sequencing (NGS) technologies has enabled the genes and genomes within these microbial communities to be explored in a rapid and cost-effective manner. The advantages of using NGS to investigate microbiomes surpass the traditional non-molecular methods that are limited in their sensitivity, and conventional molecular approaches that are limited in their scalability. In recent years the number of studies using NGS to investigate the microbial diversity and composition of ticks has expanded. Here, we provide a review of NGS strategies for tick microbiome studies and discuss the recent findings from tick NGS investigations, including the bacterial diversity and composition, influential factors, and implications of the tick microbiome.


September 22, 2019

Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing.

Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ~70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. In other cases, the identified transcripts have improved existing gene models. Averaging across all six tissues, 90% of the splice junctions are supported by short reads from matched tissues. In addition, we identified a large number of novel long non-coding RNAs and fusion transcripts and found that DNA methylation plays an important role in generating various isoforms. Our results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.