Menu
September 22, 2019

ISOdb: A comprehensive database of full-length isoforms generated by Iso-Seq.

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.


September 22, 2019

Event analysis: Using transcript events to improve estimates of abundance in RNA-seq data.

Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ~5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies. Copyright © 2018 Newman et al.


September 22, 2019

The genomic and functional landscapes of developmental plasticity in the American cockroach.

Many cockroach species have adapted to urban environments, and some have been serious pests of public health in the tropics and subtropics. Here, we present the 3.38-Gb genome and a consensus gene set of the American cockroach, Periplaneta americana. We report insights from both genomic and functional investigations into the underlying basis of its adaptation to urban environments and developmental plasticity. In comparison with other insects, expansions of gene families in P. americana exist for most core gene families likely associated with environmental adaptation, such as chemoreception and detoxification. Multiple pathways regulating metamorphic development are well conserved, and RNAi experiments inform on key roles of 20-hydroxyecdysone, juvenile hormone, insulin, and decapentaplegic signals in regulating plasticity. Our analyses reveal a high level of sequence identity in genes between the American cockroach and two termite species, advancing it as a valuable model to study the evolutionary relationships between cockroaches and termites.


September 22, 2019

Mesoscale variability of the summer bloom over the northern Ross Sea shelf: A tale of two banks

Multi-year satellite records indicate an asymmetric spatial pattern in the summer bloom in the Northern Ross Sea, with the largest blooms over the shallows of Pennell Bank compared to Mawson Bank. In 2010–2011, high-resolution spatiotemporal in situ sampling focused on these two banks to better understand factors contributing to this pattern. Dissolved and particulate Fe profiles suggested similar surface water depletion of dissolved Fe on both banks. The surface sediments and velocity observations indicate a more energetic water column over Mawson Bank. Consequently, the surface mixed layer over Pennell Bank was more homogeneous and shallower. Over Mawson Bank we observed a thicker more homogeneous bottom boundary layer resulting from stronger tidal and sub-tidal currents. These stronger currents scour the seafloor resulting in sediments less likely to release additional sedimentary iron. Estimates of the quantum yield of photosynthesis and the initial slope of the photosynthesis-irradiance response were lower over Mawson Bank, indicating higher iron stress over Mawson Bank. Overall, the apparent additional sedimentary source of iron to, and longer surface residence time over Pennell Bank, as well as the reduced fluxes from the more isolated bottom mixed layer over Mawson Bank, sustain the observed asymmetric pattern across both banks.


September 22, 2019

SMRT sequencing of full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt).

This study was aimed at generating the full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt) using single-molecule real-time (SMRT) sequencing. Four developmental stages of A. hygrophila, including eggs, larvae, pupae, and adults were harvested for isolating total RNA. The mixed samples were used for SMRT sequencing to generate the full-length transcriptome. Based on the obtained transcriptome data, alternative splicing event, simple sequence repeat (SSR) analysis, coding sequence prediction, transcript functional annotation, and lncRNA prediction were performed. Total 9.45?Gb of clean reads were generated, including 335,045 reads of insert (ROI) and 158,085 full-length non-chimeric (FLNC) reads. Transcript clustering analysis of FLNC reads identified 40,004 consensus isoforms, including 31,015 high-quality ones. After removing redundant reads, 28,982 transcripts were obtained. Total 145 alternative splicing events were predicted. Additionally, 12,753 SSRs and 16,205 coding sequences were identified based on SSR analysis. Furthermore, 24,031 transcripts were annotated in eight functional databases, and 4,198 lncRNAs were predicted. This is the first study to perform SMRT sequencing of the full-length transcriptome of A. hygrophila. The obtained transcriptome may facilitate further exploration of the genetic data of A. hygrophila and uncover the interactions between this insect and the ecosystem.


September 22, 2019

Single molecule, full-length transcript sequencing provides insight into the extreme metabolism of ruby-throated hummingbird Archilochus colubris

Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids, derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding a total of 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism.


September 22, 2019

Atmospheric N deposition alters connectance, but not functional potential among saprotrophic bacterial communities.

The use of co-occurrence patterns to investigate interactions between micro-organisms has provided novel insight into organismal interactions within microbial communities. However, anthropogenic impacts on microbial co-occurrence patterns and ecosystem function remain an important gap in our ecological knowledge. In a northern hardwood forest ecosystem located in Michigan, USA, 20 years of experimentally increased atmospheric N deposition has reduced forest floor decay and increased soil C storage. This ecosystem-level response occurred concomitantly with compositional changes in saprophytic fungi and bacteria. Here, we investigated the influence of experimental N deposition on biotic interactions among forest floor bacterial assemblages by employing phylogenetic and molecular ecological network analysis. When compared to the ambient treatment, the forest floor bacterial community under experimental N deposition was less rich, more phylogenetically dispersed and exhibited a more clustered co-occurrence network topology. Together, our observations reveal the presence of increased biotic interactions among saprotrophic bacterial assemblages under future rates of N deposition. Moreover, they support the hypothesis that nearly two decades of experimental N deposition can modify the organization of microbial communities and provide further insight into why anthropogenic N deposition has reduced decomposition, increased soil C storage and accelerated phenolic DOC production in our field experiment. © 2015 John Wiley & Sons Ltd.


September 22, 2019

Gill bacteria enable a novel digestive strategy in a wood-feeding mollusk.

Bacteria play many important roles in animal digestive systems, including the provision of enzymes critical to digestion. Typically, complex communities of bacteria reside in the gut lumen in direct contact with the ingested materials they help to digest. Here, we demonstrate a previously undescribed digestive strategy in the wood-eating marine bivalve Bankia setacea, wherein digestive bacteria are housed in a location remote from the gut. These bivalves, commonly known as shipworms, lack a resident microbiota in the gut compartment where wood is digested but harbor endosymbiotic bacteria within specialized cells in their gills. We show that this comparatively simple bacterial community produces wood-degrading enzymes that are selectively translocated from gill to gut. These enzymes, which include just a small subset of the predicted wood-degrading enzymes encoded in the endosymbiont genomes, accumulate in the gut to the near exclusion of other endosymbiont-made proteins. This strategy of remote enzyme production provides the shipworm with a mechanism to capture liberated sugars from wood without competition from an endogenous gut microbiota. Because only those proteins required for wood digestion are translocated to the gut, this newly described system reveals which of many possible enzymes and enzyme combinations are minimally required for wood degradation. Thus, although it has historically had negative impacts on human welfare, the shipworm digestive process now has the potential to have a positive impact on industries that convert wood and other plant biomass to renewable fuels, fine chemicals, food, feeds, textiles, and paper products.


September 22, 2019

Complete genome sequences of two human oral microbiome commensals, Streptococcus salivarius ATCC 25975 and S. salivarius ATCC 27945.

Streptococcus salivarius strains are significant contributors to the human oral microbiome. Some possess unique fimbriae that give them the ability to coaggregate and colonize particular oral structures. We present here the complete genomes of Streptococcus salivarius Lancefield K(-)/K(+) strains ATCC 25975 and ATCC 27945, which can and cannot, respectively, produce fimbriae. Copyright © 2017 Butler et al.


September 22, 2019

Towards long-read metagenomics: complete assembly of three novel genomes from bacteria dependent on a diazotrophic cyanobacterium in a freshwater lake co-culture.

Here we report three complete bacterial genome assemblies from a PacBio shotgun metagenome of a co-culture from Upper Klamath Lake, OR. Genome annotations and culture conditions indicate these bacteria are dependent on carbon and nitrogen fixation from the cyanobacterium Aphanizomenon flos-aquae, whose genome was assembled to draft-quality. Due to their taxonomic novelty relative to previously sequenced bacteria, we have temporarily designated these bacteria as incertae sedis Hyphomonadaceae strain UKL13-1 (3,501,508 bp and 56.12% GC), incertae sedis Betaproteobacterium strain UKL13-2 (3,387,087 bp and 54.98% GC), and incertae sedis Bacteroidetes strain UKL13-3 (3,236,529 bp and 37.33% GC). Each genome consists of a single circular chromosome with no identified plasmids. When compared with binned Illumina assemblies of the same three genomes, there was ~7% discrepancy in total genome length. Gaps where Illumina assemblies broke were often due to repetitive elements. Within these missing sequences were essential genes and genes associated with a variety of functional categories. Annotated gene content reveals that both Proteobacteria are aerobic anoxygenic phototrophs, with Betaproteobacterium UKL13-2 potentially capable of phototrophic oxidation of sulfur compounds. Both proteobacterial genomes contain transporters suggesting they are scavenging fixed nitrogen from A. flos-aquae in the form of ammonium. Bacteroidetes UKL13-3 has few completely annotated biosynthetic pathways, and has a comparatively higher proportion of unannotated genes. The genomes were detected in only a few other freshwater metagenomes, suggesting that these bacteria are not ubiquitous in freshwater systems. Our results indicate that long-read sequencing is a viable method for sequencing dominant members from low-diversity microbial communities, and should be considered for environmental metagenomics when conditions meet these requirements.


September 22, 2019

High resolution annotation of zebrafish transcriptome using long-read sequencing.

With the emergence of zebrafish as an important model organism, a concerted effort has been made to study its transcriptome. This effort is limited, however, by gaps in zebrafish annotation, which are especially pronounced concerning transcripts dynamically expressed during zygotic genome activation (ZGA). To date, short-read sequencing has been the principal technology for zebrafish transcriptome annotation. In part because these sequence reads are too short for assembly methods to resolve the full complexity of the transcriptome, the current annotation is rudimentary. By providing direct observation of full-length transcripts, recently refined long-read sequencing platforms can dramatically improve annotation coverage and accuracy. Here, we leveraged the SMRT platform to study the transcriptome of zebrafish embryos before and after ZGA. Our analysis revealed additional novelty and complexity in thehttps://www.ncbi.nlm.nih.gov/pubmed/nfidence novel transcripts that originated from previously unannotated loci and 1835 high-confidence new isoforms in previously annotated genes. We validated these findings using a suite of computational approaches including structural prediction, sequence homology, and functional conservation analyses, as well as by confirmatory transcript quantification with short-read sequencing data. Our analyses provided insight into new homologs and paralogs of functionally important proteins and noncoding RNAs, isoform switching occurrences, and different classes of novel splicing events. Several novel isoforms representing distinct splicing events were validated through PCR experiments, including the discovery and validation of a novel 8-kb transcript spanning multiple mir-430 elements, an important driver of early development. Our study provides a significantly improved zebrafish transcriptome annotation resource.© 2018 Nudelman et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Improved high-quality genome assembly and annotation of Tibetan hulless barley

Background The Tibetan hulless barley (Hordeum vulgare L. var. nudum), also called textquotedblleftQingketextquotedblright in Chinese and textquotedblleftNetextquotedblright in Tibetan, is the staple food for Tibetans and an important livestock feed in the Tibetan Plateau. The Tibetan hulless barley in China has about 3500 years of cultivation history, mainly produced in Tibet, Qinghai, Sichuan, Yunnan and other areas. In addition, Tibetan hulless barley has rich nutritional value and outstanding health effects, including the beta glucan, dietary fiber, amylopectin, the contents of trace elements, which are higher than any other cereal crops.Findings Here, we reported an improved high-quality assembly of Tibetan hulless barley genome with 4.0 Gb in size. We employed the falcon assembly package, scaffolding and error correction tools to finish improvement using PacBio long reads sequencing technology, with contig and scaffold N50 lengths of 1.563Mb and 4.006Mb, respectively, representing more continuous than the original Tibetan hulless barley genome nearly two orders of magnitude. We also re-annotated the new assembly, and reported 61,303 stringent confident putative protein-coding genes, of which 40,457 is HC genes. We have developed a new Tibetan hulless barley genome database (THBGD) to download and use friendly, as well as to better manage the information of the Tibetan hulless barley genetic resources.Conclusions The availability of new Tibetan hulless barley genome and annotations will take the genetics of Tibetan hulless barley to a new level and will greatly simplify the breeders effort. It will also enrich the granary of the Tibetan people.AbbreviationsBLASTBasic Local Alignment Search ToolBUSCOBenchmarking Universal Single-Copy OrthologsQVquality valuePacBioPacifc BiosciencesRNA-seqRNA sequencingNGSNext generation sequencingTGSThird generation sequencingTHBGDTibetan hulless barley Genome Database


September 22, 2019

Generation and comparative analysis of full-length transcriptomes in sweetpotato and its putative wild ancestor I. trifida.

Sweetpotato [Ipomoea batatas (L.) Lam.] is one of the most important crops in many developing countries and provides a candidate source of bioenergy. However, neither high-quality reference genome nor large-scale full-length cDNA sequences for this outcrossing hexaploid are still lacking, which in turn impedes progress in research studies in sweetpotato functional genomics and molecular breeding. In this study, we apply a combination of second- and third-generation sequencing technologies to sequence full-length transcriptomes in sweetpotato and its putative ancestor I. trifida. In total, we obtained 53,861/51,184 high-quality transcripts, which includes 34,963/33,637 putative full-length cDNA sequences, from sweetpotato/I. trifida. Amongst, we identified 104,540/94,174 open reading frames, 1476/1475 transcription factors, 25,315/27,090 simple sequence repeats, 417/531 long non-coding RNAs out of the sweetpotato/I. trifida dataset. By utilizing public available genomic contigs, we analyzed the gene features (including exon number, exon size, intron number, intron size, exon-intron structure) of 33,119 and 32,793 full-length transcripts in sweetpotato and I. trifida, respectively. Furthermore, comparative analysis between our transcript datasets and other large-scale cDNA datasets from different plant species enables us assessing the quality of public datasets, estimating the genetic similarity across relative species, and surveyed the evolutionary pattern of genes. Overall, our study provided fundamental resources of large-scale full-length transcripts in sweetpotato and its putative ancestor, for the first time, and would facilitate structural, functional and comparative genomics studies in this important crop.


September 22, 2019

Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken.

The domestic chicken (Gallus gallus) is widely used as a model in developmental biology and is also an important livestock species. We describe a novel approach to data integration to generate an mRNA expression atlas for the chicken spanning major tissue types and developmental stages, using a diverse range of publicly-archived RNA-seq datasets and new data derived from immune cells and tissues.Randomly down-sampling RNA-seq datasets to a common depth and quantifying expression against a reference transcriptome using the mRNA quantitation tool Kallisto ensured that disparate datasets explored comparable transcriptomic space. The network analysis tool Graphia was used to extract clusters of co-expressed genes from the resulting expression atlas, many of which were tissue or cell-type restricted, contained transcription factors that have previously been implicated in their regulation, or were otherwise associated with biological processes, such as the cell cycle. The atlas provides a resource for the functional annotation of genes that currently have only a locus ID. We cross-referenced the RNA-seq atlas to a publicly available embryonic Cap Analysis of Gene Expression (CAGE) dataset to infer the developmental time course of organ systems, and to identify a signature of the expansion of tissue macrophage populations during development.Expression profiles obtained from public RNA-seq datasets – despite being generated by different laboratories using different methodologies – can be made comparable to each other. This meta-analytic approach to RNA-seq can be extended with new datasets from novel tissues, and is applicable to any species.


September 22, 2019

Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization.

Marine sponges are ancient metazoans that are populated by distinct and highly diverse microbial communities. In order to obtain deeper insights into the functional gene repertoire of the Mediterranean sponge Aplysina aerophoba, we combined Illumina short-read and PacBio long-read sequencing followed by un-targeted metagenomic binning. We identified a total of 37 high-quality bins representing 11 bacterial phyla and two candidate phyla. Statistical comparison of symbiont genomes with selected reference genomes revealed a significant enrichment of genes related to bacterial defense (restriction-modification systems, toxin-antitoxin systems) as well as genes involved in host colonization and extracellular matrix utilization in sponge symbionts. A within-symbionts genome comparison revealed a nutritional specialization of at least two symbiont guilds, where one appears to metabolize carnitine and the other sulfated polysaccharides, both of which are abundant molecules in the sponge extracellular matrix. A third guild of symbionts may be viewed as nutritional generalists that perform largely the same metabolic pathways but lack such extraordinary numbers of the relevant genes. This study characterizes the genomic repertoire of sponge symbionts at an unprecedented resolution and it provides greater insights into the molecular mechanisms underlying microbial-sponge symbiosis.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.