Menu
September 22, 2019  |  

A survey of transcriptome complexity in Sus scrofa using single-molecule long-read sequencing.

Alternative splicing (AS) and fusion transcripts produce a vast expansion of transcriptomes and proteomes diversity. However, the reliability of these events and the extend of epigenetic mechanisms have not been adequately addressed due to its limitation of uncertainties about the complete structure of mRNA. Here we combined single-molecule real-time sequencing, Illumina RNA-seq and DNA methylation data to characterize the landscapes of DNA methylation on AS, fusion isoforms formation and lncRNA feature and further to unveil the transcriptome complexity of pig. Our analysis identified an unprecedented scale of high-quality full-length isoforms with over 28,127 novel isoforms from 26,881 novel genes. More than 92,000 novel AS events were detected and intron retention predominated in AS model, followed by exon skipping. Interestingly, we found that DNA methylation played an important role in generating various AS isoforms by regulating splicing sites, promoter regions and first exons. Furthermore, we identified a large of fusion transcripts and novel lncRNAs, and found that DNA methylation of the promoter and gene body could regulate lncRNA expression. Our results significantly improved existed gene models of pig and unveiled that pig AS and epigenetic modify were more complex than previously thought.


September 22, 2019  |  

Comprehensive transcriptome analysis of Sarcophaga peregrina, a forensically important fly species.

Sarcophaga peregrina (flesh fly) is a frequently found fly species in Palaearctic, Oriental, and Australasian regions that can be used to estimate minimal postmortem intervals important for forensic investigations. Despite its forensic importance, the genome information of S. peregrina has not been fully described. Therefore, we generated a comprehensive gene expression dataset using RNA sequencing and carried out de novo assembly to characterize the S. peregrina transcriptome. We obtained precise sequence information for RNA transcripts using two different methods. Based on primary sequence information, we identified sets of assembled unigenes and predicted coding sequences. Functional annotation of the aligned unigenes was performed using the UniProt, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes databases. As a result, 26,580,352 and 83,221 raw reads were obtained using the Illumina MiSeq and Pacbio RS II Iso-Seq sequencing applications, respectively. From these reads, 55,730 contigs were successfully annotated. The present study provides the resulting genome information of S. peregrina, which is valuable for forensic applications.


September 22, 2019  |  

Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human.

Despite the significance of chicken as a model organism, our understanding of the chicken transcriptome is limited compared to human. This issue is common to all non-human vertebrate annotations due to the difficulty in transcript identification from short read RNAseq data. While previous studies have used single molecule long read sequencing for transcript discovery, they did not perform RNA normalization and 5′-cap selection which may have resulted in lower transcriptome coverage and truncated transcript sequences.We sequenced normalised chicken brain and embryo RNA libraries with Pacific Bioscience Iso-Seq. 5′ cap selection was performed on the embryo library to provide methodological comparison. From these Iso-Seq sequencing projects, we have identified 60 k transcripts and 29 k genes within the chicken transcriptome. Of these, more than 20 k are novel lncRNA transcripts with ~3 k classified as sense exonic overlapping lncRNA, which is a class that is underrepresented in many vertebrate annotations. The relative proportion of alternative transcription events revealed striking similarities between the chicken and human transcriptomes while also providing explanations for previously observed genomic differences.Our results indicate that the chicken transcriptome is similar in complexity compared to human, and provide insights into other vertebrate biology. Our methodology demonstrates the potential of Iso-Seq sequencing to rapidly expand our knowledge of transcriptomics.


September 22, 2019  |  

100K Pathogen Genome Project.

The 100K Pathogen Genome Project is producing draft and closed genome sequences from diverse pathogens. This project expanded globally to include a snapshot of global bacterial genome diversity. The genomes form a sequence database that has a variety of uses from systematics to public health. Copyright © 2017 Weimer.


September 22, 2019  |  

Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection.

Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut microbiome data sets. Comparison with the human microbiome revealed rumen-specific enrichment for genes encoding de novo synthesis of vitamin B12, ongoing evolution by gene loss and potential vertical inheritance of the rumen microbiome based on underrepresentation of markers of environmental stress. We estimate that our Hungate genome resource represents ~75% of the genus-level bacterial and archaeal taxa present in the rumen.


September 22, 2019  |  

The state of play in higher eukaryote gene annotation.

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe – or ‘annotate’ – genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists – from clinicians to evolutionary biologists – need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.


September 22, 2019  |  

Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire.

Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Next generation sequencing methods provide unique approaches to a number of immuno-based research areas including antibody discovery and engineering, disease surveillance, and host immune response to vaccines. In particular, single-molecule circular consensus sequencing permits the sequencing of antibody repertoires at previously unattainable depths of coverage and accuracy. We approached the bovine immunoglobulin G (IgG) repertoire with the objective of characterizing diversity of expressed IgG transcripts. Here we present single-molecule real-time sequencing data of expressed IgG heavy-chain repertoires of four individual cattle. We describe the diversity observed within antigen binding regions and visualize this diversity using a network-based approach.We generated 49,945 high quality cDNA sequences, each spanning the entire IgG variable region from four Bos taurus calves. From these sequences we identified 49,521 antigen binding regions using the automated Paratome web server. Approximately 9% of all unique complementarity determining 2 (CDR2) sequences were of variable lengths. A bimodal distribution of unique CDR3 sequence lengths was observed, with common lengths of 5-6 and 21-25 amino acids. The average number of cysteine residues in CDR3s increased with CDR3 length and we observed that cysteine residues were centrally located in CDR3s. We identified 19 extremely long CDR3 sequences (up to 62 amino acids in length) within IgG transcripts. Network analyses revealed distinct patterns among the expressed IgG antigen binding repertoires of the examined individuals.We utilized circular consensus sequencing technology to provide baseline data of the expressed bovine IgG repertoire that can be used for future studies important to livestock research. Somatic mutation resulting in base insertions and deletions in CDR2 further diversifies the bovine antibody repertoire. In contrast to previous studies, our data indicate that unusually long CDR3 sequences are not unique to IgM antibodies in cattle. Centrally located cysteine residues in bovine CDR3s provide further evidence that disulfide bond formation is likely of structural importance. We hypothesize that network or cluster-based analyses of expressed antibody repertoires from controlled challenge experiments will help identify novel natural antigen binding solutions to specific pathogens of interest.


September 22, 2019  |  

Assessment of the physicochemical properties and bacterial composition of Lactobacillus plantarum and Enterococcus faecium-fermented Astragalus membranaceus using single molecule, real-time sequencing technology.

We investigated if fermentation with probiotic cultures could improve the production of health-promoting biological compounds in Astragalus membranaceus. We tested the probiotics Enterococcus faecium, Lactobacillus plantarum and Enterococcus faecium?+?Lactobacillus plantarum and applied PacBio single molecule, real-time sequencing technology (SMRT) to evaluate the quality of Astragalus fermentation. We found that the production rates of acetic acid, methylacetic acid, aethyl acetic acid and lactic acid using E. faecium?+?L. plantarum were 1866.24?mg/kg on day 15, 203.80?mg/kg on day 30, 996.04?mg/kg on day 15, and 3081.99?mg/kg on day 20, respectively. Other production rates were: polysaccharides, 9.43%, 8.51%, and 7.59% on day 10; saponins, 19.6912?mg/g, 21.6630?mg/g and 20.2084?mg/g on day 15; and flavonoids, 1.9032?mg/g, 2.0835?mg/g, and 1.7086?mg/g on day 20 using E. faecium, L. plantarum and E. faecium?+?L. plantarum, respectively. SMRT was used to analyze microbial composition, and we found that E. faecium and L. plantarum were the most prevalent species after fermentation for 3 days. E. faecium?+?L. plantarum gave more positive effects than single strains in the Astragalus solid state fermentation process. Our data demonstrated that the SMRT sequencing platform is applicable to quality assessment of Astragalus fermentation.


September 22, 2019  |  

Extensive horizontal gene transfer in cheese-associated bacteria.

Acquisition of genes through horizontal gene transfer (HGT) allows microbes to rapidly gain new capabilities and adapt to new or changing environments. Identifying widespread HGT regions within multispecies microbiomes can pinpoint the molecular mechanisms that play key roles in microbiome assembly. We sought to identify horizontally transferred genes within a model microbiome, the cheese rind. Comparing 31 newly sequenced and 134 previously sequenced bacterial isolates from cheese rinds, we identified over 200 putative horizontally transferred genomic regions containing 4733 protein coding genes. The largest of these regions are enriched for genes involved in siderophore acquisition, and are widely distributed in cheese rinds in both Europe and the US. These results suggest that HGT is prevalent in cheese rind microbiomes, and that identification of genes that are frequently transferred in a particular environment may provide insight into the selective forces shaping microbial communities.


September 22, 2019  |  

The first whole transcriptomic exploration of pre-oviposited early chicken embryos using single and bulked embryonic RNA-sequencing.

The chicken is a valuable model organism, especially in evolutionary and embryology research because its embryonic development occurs in the egg. However, despite its scientific importance, no transcriptome data have been generated for deciphering the early developmental stages of the chicken because of practical and technical constraints in accessing pre-oviposited embryos.Here, we determine the entire transcriptome of pre-oviposited avian embryos, including oocyte, zygote, and intrauterine embryos from Eyal-giladi and Kochav stage I (EGK.I) to EGK.X collected using a noninvasive approach for the first time. We also compare RNA-sequencing data obtained using a bulked embryo sequencing and single embryo/cell sequencing technique. The raw sequencing data were preprocessed with two genome builds, Galgal4 and Galgal5, and the expression of 17,108 and 26,102 genes was quantified in the respective builds. There were some differences between the two techniques, as well as between the two genome builds, and these were affected by the emergence of long intergenic noncoding RNA annotations.The first transcriptome datasets of pre-oviposited early chicken embryos based on bulked and single embryo sequencing techniques will serve as a valuable resource for investigating early avian embryogenesis, for comparative studies among vertebrates, and for novel gene annotation in the chicken genome.


September 22, 2019  |  

PCR and omics based techniques to study the diversity, ecology and biology of anaerobic fungi: Insights, challenges andopportunities.

Anaerobic fungi (phylum Neocallimastigomycota) are common inhabitants of the digestive tract of mammalian herbivores, and in the rumen, can account for up to 20% of the microbial biomass. Anaerobic fungi play a primary role in the degradation of lignocellulosic plant material. They also have a syntrophic interaction with methanogenic archaea, which increases their fiber degradation activity. To date, nine anaerobic fungal genera have been described, with further novel taxonomic groupings known to exist based on culture-independent molecular surveys. However, the true extent of their diversity may be even more extensively underestimated as anaerobic fungi continue being discovered in yet unexplored gut and non-gut environments. Additionally many studies are now known to have used primers that provide incomplete coverage of the Neocallimastigomycota. For ecological studies the internal transcribed spacer 1 region (ITS1) has been the taxonomic marker of choice, but due to various limitations the large subunit rRNA (LSU) is now being increasingly used. How the continued expansion of our knowledge regarding anaerobic fungal diversity will impact on our understanding of their biology and ecological role remains unclear; particularly as it is becoming apparent that anaerobic fungi display niche differentiation. As a consequence, there is a need to move beyond the broad generalization of anaerobic fungi as fiber-degraders, and explore the fundamental differences that underpin their ability to exist in distinct ecological niches. Application of genomics, transcriptomics, proteomics and metabolomics to their study in pure/mixed cultures and environmental samples will be invaluable in this process. To date the genomes and transcriptomes of several characterized anaerobic fungal isolates have been successfully generated. In contrast, the application of proteomics and metabolomics to anaerobic fungal analysis is still in its infancy. A central problem for all analyses, however, is the limited functional annotation of anaerobic fungal sequence data. There is therefore an urgent need to expand information held within publicly available reference databases. Once this challenge is overcome, along with improved sample collection and extraction, the application of these techniques will be key in furthering our understanding of the ecological role and impact of anaerobic fungi in the wide range of environments they inhabit.


September 22, 2019  |  

Complete genome sequences of two genotype A2 small ruminant lentiviruses isolated from infected U.S. sheep.

Two distinct subgroups of genotype A2 small ruminant lentiviruses (SRLVs) have been identified in the United States that infect sheep with specific host transmembrane protein 154 (TMEM154) diplotypes. Here, we report the first two complete genome sequences of SRLV strains infecting U.S. sheep belonging to genotype A2, subgroups 1 and 2. Copyright © 2017 Workman et al.


September 22, 2019  |  

Improving eukaryotic genome annotation using single molecule mRNA sequencing.

The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq.We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features.Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.


September 22, 2019  |  

Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing.

Red clover (Trifolium pratense L.) is an important cool-season legume plant, which is the most widely planted forage legume after alfalfa. Although a draft genome sequence was published already, the sequences and completed structure of mRNA transcripts remain unclear, which limit further explore on red clover.In this study, the red clover transcriptome was sequenced using single-molecule long-read sequencing to identify full-length splice isoforms, and 29,730 novel isoforms from known genes and 2194 novel isoforms from novel genes were identified. A total of 5492 alternative splicing events was identified and the majority of alter spliced events in red clover was corrected as intron retention. In addition, of the 15,229 genes detected by SMRT, 8719 including 186,517 transcripts have at least one poly(A) site. Furthermore, we identified 4333 long non-coding RNAs and 3762 fusion transcripts.We analyzed full-length transcriptome of red clover with PacBio SMRT. Those new findings provided important information for improving red clover draft genome annotation and fully characterization of red clover transcriptome.


September 22, 2019  |  

Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers.

Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided. For more information, see “Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers” P.R. Myer, M. Kim, H.C. Freetly, T.P.L. Smith (2016) [1].


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.