Read length Archives - Page 4 of 29

September 22, 2019

Analysis of gut microbiota – An ever changing landscape.

In the last two decades, the field of metagenomics has greatly expanded due to improvement in sequencing technologies allowing for a more comprehensive characterization of microbial communities. The use of these technologies has led to an unprecedented understanding of human, animal, and environmental microbiomes and have shown that the gut microbiota are comparable to an organ that is intrinsically linked with a variety of diseases. Characterization of microbial communities using next-generation sequencing-by-synthesis approaches have revealed important shifts in microbiota associated with debilitating diseases such as Clostridium difficile infection. But due to limitations in sequence read length, primer biases, and the quality of databases, genus- and species-level classification have been difficult. Third-generation technologies, such as Pacific Biosciences’ single molecule, real-time (SMRT) approach, allow for unbiased, more specific identification of species that are likely clinically relevant. Comparison of Illumina next-generation characterization and SMRT sequencing of samples from patients treated for C. difficile infection revealed similarities in community composition at the phylum and family levels, but SMRT sequencing further allowed for species-level characterization – permitting a better understanding of the microbial ecology of this disease. Thus, as sequencing technologies continue to advance, new species-level insights can be gained in the study of complex and clinically-relevant microbial communities.

September 22, 2019

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data.

Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems.We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification.The SparseIso package is available at http://github.com/henryxushi/SparseIso.xuan@vt.edu.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

September 22, 2019

Saliva and tooth biofilm bacterial microbiota in adolescents in a low caries community.

The oral cavity harbours a complex microbiome that is linked to dental diseases and serves as a route to other parts of the body. Here, the aims were to characterize the oral microbiota by deep sequencing in a low-caries population with regular dental care since childhood and search for association with caries prevalence and incidence. Saliva and tooth biofilm from 17-year-olds and mock bacteria communities were analysed using 16S rDNA Illumina MiSeq (v3-v4) and PacBio SMRT (v1-v8) sequencing including validity and reliability estimates. Caries was scored at 17 and 19 years of age. Both sequencing platforms revealed that Firmicutes dominated in the saliva, whereas Firmicutes and Actinobacteria abundances were similar in tooth biofilm. Saliva microbiota discriminated caries-affected from caries-free adolescents, with enumeration of Scardovia wiggsiae, Streptococcus mutans, Bifidobacterium longum, Leptotrichia sp. HOT498, and Selenomonas spp. in caries-affected participants. Adolescents with B. longum in saliva had significantly higher 2-year caries increment. PacBio SMRT revealed Corynebacterium matruchotii as the most prevalent species in tooth biofilm. In conclusion, both sequencing methods were reliable and valid for oral samples, and saliva microbiota was associated with cross-sectional caries prevalence, especially S. wiggsiae, S. mutans, and B. longum; the latter also with the 2-year caries incidence.

September 22, 2019

Differential expression analysis of olfactory genes based on a combination of sequencing platforms and behavioral investigations in Aphidius gifuensis.

Aphidius gifuensis Ashmead is a dominant endoparasitoid of aphids, such as Myzus persicae and Sitobion avenae, and plays an important role in controlling aphids in various habitats, including tobacco plants and wheat in China. A. gifuensis has been successfully applied for the biological control of aphids, especially M. persicae, in green houses and fields in China. The corresponding parasites, as well as its mate-searching behaviors, are subjects of considerable interest. Previous A. gifuensis transcriptome studies have relied on short-read next-generation sequencing (NGS), and the vast majority of the resulting isotigs do not represent full-length cDNA. Here, we employed a combination of NGS and single-molecule real-time (SMRT) sequencing of virgin females (VFs), mated females (MFs), virgin males (VMs), and mated males (MMs) to comprehensively study the A. gifuensis transcriptome. Behavioral responses to the aphid alarm pheromone (E-ß-farnesene, EBF) as well as to A. gifuensis of the opposite sex were also studied. VMs were found to be attracted by female wasps and MFs were repelled by male wasps, whereas MMs and VFs did not respond to the opposite sex. In addition, VFs, MFs, and MMs were attracted by EBF, while VMs did not respond. According to these results, we performed a personalized differential gene expression analysis of olfactory gene sets (66 odorant receptors, 25 inotropic receptors, 16 odorant-binding proteins, and 12 chemosensory proteins) in virgin and mated A. gifuensis of both sexes, and identified 13 candidate genes whose expression levels were highly consistent with behavioral test results, suggesting potential functions for these genes in pheromone perception.

September 22, 2019

An intact gut microbiota may be required for lactoferrin-driven immunomodulation in rats

Lactoferrin can modulate both the host immunity and gut microbiota. However, whether the immune modulation requires the gut microbiota has not been directly shown. Thus, our study compared (1) lactoferrin-driven immunomodulation profiles and (2) changes in fecal phylogenic metagenome with and without antibiotics-induced dysbiosis in rats. Rats receiving only lactoferrin but not both lactoferrin and antibiotics had a Th-1 type cytokine serum profile. Significant differences were detected between the fecal microbiota of the lactoferrin and control groups at day 19 and/or day 33 but not initially, with a shift in the major contributors for community dissimilarity to Clostridium, Lactobacillus, and Oscillibacter valericigenes. The antibiotics-induced dysbiosis enriched the proinflammatory phyla, Proteobacteria and Deferribacteres, together with the anti-inflammatory species, Akkermansia muciniphila, while suppressed some butyrate-producers from the Firmicutes phylum. Our study shows that an intact microbiota is necessary for lactoferrin-driven immunomodulation.

September 22, 2019

A comprehensive quality evaluation system for complex herbal medicine using PacBio sequencing, PCR-denaturing gradient gel electrophoresis, and several chemical approaches.

Herbal medicine is a major component of complementary and alternative medicine, contributing significantly to the health of many people and communities. Quality control of herbal medicine is crucial to ensure that it is safe and sound for use. Here, we investigated a comprehensive quality evaluation system for a classic herbal medicine, Danggui Buxue Formula, by applying genetic-based and analytical chemistry approaches to authenticate and evaluate the quality of its samples. For authenticity, we successfully applied two novel technologies, third-generation sequencing and PCR-DGGE (denaturing gradient gel electrophoresis), to analyze the ingredient composition of the tested samples. For quality evaluation, we used high performance liquid chromatography assays to determine the content of chemical markers to help estimate the dosage relationship between its two raw materials, plant roots of Huangqi and Danggui. A series of surveys were then conducted against several exogenous contaminations, aiming to further access the efficacy and safety of the samples. In conclusion, the quality evaluation system demonstrated here can potentially address the authenticity, quality, and safety of herbal medicines, thus providing novel insight for enhancing their overall quality control. Highlight: We established a comprehensive quality evaluation system for herbal medicine, by combining two genetic-based approaches third-generation sequencing and DGGE (denaturing gradient gel electrophoresis) with analytical chemistry approaches to achieve the authentication and quality connotation of the samples.

September 22, 2019

cDNA library enrichment of full length transcripts for SMRT long read sequencing.

The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.

September 22, 2019

Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis.

Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

September 22, 2019

Using PacBio long-read high-throughput microbial gene amplicon sequencing to evaluate infant formula safety.

Infant formula (IF) requires a strict microbiological standard because of the high vulnerability of infants to foodborne diseases. The current study used the PacBio single molecule real-time (SMRT) sequencing platform to generate full-length 16S rRNA-based bacterial microbiota profiles of thirty Chinese domestic and imported IF samples. A total of 600 species were identified, dominated by Streptococcus thermophilus, Lactococcus lactis and Lactococcus piscium. Distinctive bacterial profiles were observed between the two sample groups, as confirmed with both principal coordinate analysis and multivariate analysis of variance. Moreover, the product whey protein nitrogen index (WPNI), representing the degree of preheating, negatively correlated with the relative abundances of the Bacillus genus. Our study has demonstrated the application of the PacBio SMRT sequencing platform in assessing the bacterial contamination of IF products, which is of interest to the dairy industry for effective monitoring of microbial quality and safety during production.

September 22, 2019

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.

September 22, 2019

Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads.

Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs.We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms.PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.

September 22, 2019

Analysis of microbial community structure of pit mud for Chinese strong-flavor liquor fermentation using next generation DNA sequencing of full-length 16S rRNA

The pit is the necessary bioreactor for brewing process of Chinese strong-flavor liquor. Pit mud in pits contains a large number of microorganisms and is a complex ecosystem. The analysis of bacterial flora in pit mud is of great significance to understand liquor fermentation mechanisms. To overcome taxonomic limitations of short reads in 16S rRNA variable region sequencing, we used high-throughput DNA sequencing of near full-length 16S rRNA gene to analyze microbial compositions of different types of pit mud that produce different qualities of strong-flavor liquor. The results showed that the main species in pit mud were Pseudomonas extremaustralis 14-3, Pseudomonas veronii, Serratia marcescens WW4, and Clostridium leptum in Ruminiclostridium. The microbial diversity of pit mud with different quality was significantly different. From poor to good quality of pit mud (thus the quality of liquor), the relative abundances of Ruminiclostridium and Syntrophomonas in Firmicutes was increased, and the relative abundance of Olsenella in Actinobacteria also increased, but the relative abundances of Pseudomonas and Serratia in Proteobacteria were decreased. The surprising findings of this study include that the diversity of intermediate level quality of N pit mud was the lowest, and the diversity levels of high quality pit mud G and poor quality pit mud B were similar. Correlation analysis showed that there were high positive correlations (r > 0.8) among different microbial groups in the flora. Based on the analysis of the microbial structures of pit mud in different quality, the good quality pit mud has a higher microbial diversity, but how this higher diversity and differential microbial compositions contribute to better quality of liquor fermentation remains obscure.

September 22, 2019

PacBio sequencing of gene families – a case study with wheat gluten genes.

Amino acids in wheat (Triticum aestivum) seeds mainly accumulate in storage proteins called gliadins and glutenins. Gliadins contain a/ß-, ?- and ?-types whereas glutenins contain HMW- and LMW-types. Known gliadin and glutenin sequences were largely determined through cloning and sequencing by capillary electrophoresis. This time-consuming process prevents us to intensively study the variation of each orthologous gene copy among cultivars. The throughput and sequencing length of Pacific Bioscience RS (PacBio) single molecule sequencing platform make it feasible to construct contiguous and non-chimeric RNA sequences. We assembled 424 wheat storage protein transcripts from ten wheat cultivars by using just one single-molecule-real-time cell. The protein genes from wheat cultivar Chinese Spring are comparable to known sequences from NCBI. We demonstrated real-time sequencing of gene families with high-throughput and low-cost. This method can be applied to studies of gene amplification and copy number variation among species and cultivars. © 2013 Elsevier B.V. All rights reserved.

September 22, 2019

A high-quality annotated transcriptome of swine peripheral blood.

High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly.A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5′ termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5′ and 3′ termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection.We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome.

September 22, 2019

Isoform sequencing provides a more comprehensive view of the Panax ginseng transcriptome.

Korean ginseng (Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng, we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana. Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng. In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.

Auto Tag: Read length

Analysis of gut microbiota – An ever changing landscape.

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data.

Saliva and tooth biofilm bacterial microbiota in adolescents in a low caries community.

Differential expression analysis of olfactory genes based on a combination of sequencing platforms and behavioral investigations in Aphidius gifuensis.

An intact gut microbiota may be required for lactoferrin-driven immunomodulation in rats

A comprehensive quality evaluation system for complex herbal medicine using PacBio sequencing, PCR-denaturing gradient gel electrophoresis, and several chemical approaches.

cDNA library enrichment of full length transcripts for SMRT long read sequencing.

Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis.

Using PacBio long-read high-throughput microbial gene amplicon sequencing to evaluate infant formula safety.

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.

Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads.

Analysis of microbial community structure of pit mud for Chinese strong-flavor liquor fermentation using next generation DNA sequencing of full-length 16S rRNA

PacBio sequencing of gene families – a case study with wheat gluten genes.

A high-quality annotated transcriptome of swine peripheral blood.

Isoform sequencing provides a more comprehensive view of the Panax ginseng transcriptome.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert