Menu
September 22, 2019

GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality.

The programs GMAP and GSNAP, for aligning RNA-Seq and DNA-Seq datasets to genomes, have evolved along with advances in biological methodology to handle longer reads, larger volumes of data, and new types of biological assays. The genomic representation has been improved to include linear genomes that can compare sequences using single-instruction multiple-data (SIMD) instructions, compressed genomic hash tables with fast access using SIMD instructions, handling of large genomes with more than four billion bp, and enhanced suffix arrays (ESAs) with novel data structures for fast access. Improvements to the algorithms have included a greedy match-and-extend algorithm using suffix arrays, segment chaining using genomic hash tables, diagonalization using segmental hash tables, and nucleotide-level dynamic programming procedures that use SIMD instructions and eliminate the need for F-loop calculations. Enhancements to the functionality of the programs include standardization of indel positions, handling of ambiguous splicing, clipping and merging of overlapping paired-end reads, and alignments to circular chromosomes and alternate scaffolds. The programs have been adapted for use in pipelines by integrating their usage into R/Bioconductor packages such as gmapR and HTSeqGenie, and these pipelines have facilitated the discovery of numerous biological phenomena.


September 22, 2019

Shift in fungal communities and associated enzyme activities along an age gradient of managed Pinus sylvestris stands.

Forestry reshapes ecosystems with respect to tree age structure, soil properties and vegetation composition. These changes are likely to be paralleled by shifts in microbial community composition with potential feedbacks on ecosystem functioning. Here, we assessed fungal communities across a chronosequence of managed Pinus sylvestris stands and investigated correlations between taxonomic composition and extracellular enzyme activities. Not surprisingly, clear-cutting had a negative effect on ectomycorrhizal fungal abundance and diversity. In contrast, clear-cutting favoured proliferation of saprotrophic fungi correlated with enzymes involved in holocellulose decomposition. During stand development, the re-establishing ectomycorrhizal fungal community shifted in composition from dominance by Atheliaceae in younger stands to Cortinarius and Russula species in older stands. Late successional ectomycorrhizal taxa correlated with enzymes involved in mobilisation of nutrients from organic matter, indicating intensified nutrient limitation. Our results suggest that maintenance of functional diversity in the ectomycorrhizal fungal community may sustain long-term forest production by retaining a capacity for symbiosis-driven recycling of organic nutrient pools.


September 22, 2019

A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing.

Maize and sorghum are both important crops with similar overall plant architectures, but they have key differences, especially in regard to their inflorescences. To better understand these two organisms at the molecular level, we compared expression profiles of both protein-coding and noncoding transcripts in 11 matched tissues using single-molecule, long-read, deep RNA sequencing. This comparative analysis revealed large numbers of novel isoforms in both species. Evolutionarily young genes were likely to be generated in reproductive tissues and usually had fewer isoforms than old genes. We also observed similarities and differences in alternative splicing patterns and activities, both among tissues and between species. The maize subgenomes exhibited no bias in isoform generation; however, genes in the B genome were more highly expressed in pollen tissue, whereas genes in the A genome were more highly expressed in endosperm. We also identified a number of splicing events conserved between maize and sorghum. In addition, we generated comprehensive and high-resolution maps of poly(A) sites, revealing similarities and differences in mRNA cleavage between the two species. Overall, our results reveal considerable splicing and expression diversity between sorghum and maize, well beyond what was reported in previous studies, likely reflecting the differences in architecture between these two species.© 2018 Wang et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Accurate determination of bacterial abundances in human metagenomes using full-length 16S sequencing reads

DNA sequencing of PCR-amplified marker genes, especially but not limited to the 16S rRNA gene, is perhaps the most common approach for profiling microbial communities. Due to technological constraints of commonly available DNA sequencing, these approaches usually take the form of short reads sequenced from a narrow, targeted variable region, with a corresponding loss of taxonomic resolution relative to the full length marker gene. We use Pacific Biosciences single-molecule, real-time circular consensus sequencing to sequence amplicons spanning the entire length of the 16S rRNA gene. However, this sequencing technology suffers from high sequencing error rate that needs to be addressed in order to take full advantage of the longer sequence. Here, we present a method to model the sequencing error process using a generalized pair hidden Markov chain model and estimate bacterial abundances in microbial samples. We demonstrate, with simulated and real data, that our model and its associated estimation procedure are able to give accurate estimates at the species (or subspecies) level, and is more flexible than existing methods like SImple Non-Bayesian TAXonomy (SINTAX).


September 22, 2019

Meeting report: 31st International Mammalian Genome Conference, Mammalian Genetics and Genomics: From Molecular Mechanisms to Translational Applications.

High on the Heidelberg hills, inside the Advanced Training Centre of the European Molecular Biology Laboratory (EMBL) campus with its unique double-helix staircase, scientists gathered for the EMBL conference “Mammalian Genetics and Genomics: From Molecular Mechanisms to Translational Applications,” organized in cooperation with the International Mammalian Genome Society (IMGS) and the Mouse Molecular Genetics (MMG) group. The conference attracted 205 participants from 30 countries, representing 6 of the 7 continents-all except Antarctica. It was a richly diverse group of geneticists, clinicians, and bioinformaticians, with presentations by established and junior investigators, including many trainees. From the 24th-27th of October 2017, they shared exciting advances in mammalian genetics and genomics research, from the introduction of cutting-edge technologies to descriptions of translational studies involving highly relevant models of human disease.


September 22, 2019

Young genes have distinct gene structure, epigenetic profiles, and transcriptional regulation.

Species-specific, new, or “orphan” genes account for 10%-30% of eukaryotic genomes. Although initially considered to have limited function, an increasing number of orphan genes have been shown to provide important phenotypic innovation. How new genes acquire regulatory sequences for proper temporal and spatial expression is unknown. Orphan gene regulation may rely in part on origination in open chromatin adjacent to preexisting promoters, although this has not yet been assessed by genome-wide analysis of chromatin states. Here, we combine taxon-rich nematode phylogenies with Iso-Seq, RNA-seq, ChIP-seq, and ATAC-seq to identify the gene structure and epigenetic signature of orphan genes in the satellite model nematode Pristionchus pacificus Consistent with previous findings, we find young genes are shorter, contain fewer exons, and are on average less strongly expressed than older genes. However, the subset of orphan genes that are expressed exhibit distinct chromatin states from similarly expressed conserved genes. Orphan gene transcription is determined by a lack of repressive histone modifications, confirming long-held hypotheses that open chromatin is important for new gene formation. Yet orphan gene start sites more closely resemble enhancers defined by H3K4me1, H3K27ac, and ATAC-seq peaks, in contrast to conserved genes that exhibit traditional promoters defined by H3K4me3 and H3K27ac. Although the majority of orphan genes are located on chromosome arms that contain high recombination rates and repressive histone marks, strongly expressed orphan genes are more randomly distributed. Our results support a model of new gene origination by rare integration into open chromatin near enhancers.© 2018 Werner et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

A community-based culture collection for targeting novel plant growth-promoting bacteria from the sugarcane microbiome.

The soil-plant ecosystem harbors an immense microbial diversity that challenges investigative approaches to study traits underlying plant-microbe association. Studies solely based on culture-dependent techniques have overlooked most microbial diversity. Here we describe the concomitant use of culture-dependent and -independent techniques to target plant-beneficial microbial groups from the sugarcane microbiome. The community-based culture collection (CBC) approach was used to access microbes from roots and stalks. The CBC recovered 399 unique bacteria representing 15.9% of the rhizosphere core microbiome and 61.6-65.3% of the endophytic core microbiomes of stalks. By cross-referencing the CBC (culture-dependent) with the sugarcane microbiome profile (culture-independent), we designed a synthetic community comprised of naturally occurring highly abundant bacterial groups from roots and stalks, most of which has been poorly explored so far. We then used maize as a model to probe the abundance-based synthetic inoculant. We show that when inoculated in maize plants, members of the synthetic community efficiently colonize plant organs, displace the natural microbiota and dominate at 53.9% of the rhizosphere microbial abundance. As a result, inoculated plants increased biomass by 3.4-fold as compared to uninoculated plants. The results demonstrate that abundance-based synthetic inoculants can be successfully applied to recover beneficial plant microbes from plant microbiota.


September 22, 2019

Metabolism of toxic sugars by strains of the bee gut symbiont Gilliamella apicola.

Social bees collect carbohydrate-rich food to support their colonies, and yet, certain carbohydrates present in their diet or produced through the breakdown of pollen are toxic to bees. The gut microbiota of social bees is dominated by a few core bacterial species, including the Gram-negative species Gilliamella apicola We isolated 42 strains of G. apicola from guts of honey bees and bumble bees and sequenced their genomes. All of the G. apicola strains share high 16S rRNA gene similarity, but they vary extensively in gene repertoires related to carbohydrate metabolism. Predicted abilities to utilize different sugars were verified experimentally. Some strains can utilize mannose, arabinose, xylose, or rhamnose (monosaccharides that can cause toxicity in bees) as their sole carbon and energy source. All of the G. apicola strains possess a manO-associated mannose family phosphotransferase system; phylogenetic analyses suggest that this was acquired from Firmicutes through horizontal gene transfer. The metabolism of mannose is specifically dependent on the presence of mannose-6-phosphate isomerase (MPI). Neither growth rates nor the utilization of glucose and fructose are affected in the presence of mannose when the gene encoding MPI is absent from the genome, suggesting that mannose is not taken up by G. apicola strains which harbor the phosphotransferase system but do not encode the MPI. Given their ability to simultaneously utilize glucose, fructose, and mannose, as well as the ability of many strains to break down other potentially toxic carbohydrates, G. apicola bacteria may have key roles in improving dietary tolerances and maintaining the health of their bee hosts.Bees are important pollinators of agricultural plants. Our study documents the ability of Gilliamella apicola, a dominant gut bacterium in honey bees and bumble bees, to utilize several sugars that are harmful to bee hosts. Using genome sequencing and growth assays, we found that the ability to metabolize certain toxic carbohydrates is directly correlated with the presence of their respective degradation pathways, indicating that metabolic potential can be accurately predicted from genomic data in these gut symbionts. Strains vary considerably in their range of utilizable carbohydrates, which likely reflects historical horizontal gene transfer and gene deletion events. Unlike their bee hosts, G. apicola bacteria are not detrimentally affected by growth on mannose-containing medium, even in strains that cannot metabolize this sugar. These results suggest that G. apicola may be an important player in modulating nutrition in the bee gut, with ultimate effects on host health. Copyright © 2016 Zheng et al.


September 22, 2019

Transcriptional fates of human-specific segmental duplications in brain.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.© 2018 Dougherty et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

The state of play in higher eukaryote gene annotation.

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe – or ‘annotate’ – genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists – from clinicians to evolutionary biologists – need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.


September 22, 2019

Targeted combinatorial alternative splicing generates brain region-specific repertoires of neurexins.

Molecular diversity of surface receptors has been hypothesized to provide a mechanism for selective synaptic connectivity. Neurexins are highly diversified receptors that drive the morphological and functional differentiation of synapses. Using a single cDNA sequencing approach, we detected 1,364 unique neurexin-a and 37 neurexin-ß mRNAs produced by alternative splicing of neurexin pre-mRNAs. This molecular diversity results from near-exhaustive combinatorial use of alternative splice insertions in Nrxn1a and Nrxn2a. By contrast, Nrxn3a exhibits several highly stereotyped exon selections that incorporate novel elements for posttranscriptional regulation of a subset of transcripts. Complexity of Nrxn1a repertoires correlates with the cellular complexity of neuronal tissues, and a specific subset of isoforms is enriched in a purified cell type. Our analysis defines the molecular diversity of a critical synaptic receptor and provides evidence that neurexin diversity is linked to cellular diversity in the nervous system. Copyright © 2014 Elsevier Inc. All rights reserved.


September 22, 2019

Full-length transcriptome of Misgurnus anguillicaudatus provides insights into evolution of genus Misgurnus.

Reconstruction and annotation of transcripts, particularly for a species without reference genome, plays a critical role in gene discovery, investigation of genomic signatures, and genome annotation in the pre-genomic era. This study generated 33,330 full-length transcripts of diploid M. anguillicaudatus using PacBio SMRT Sequencing. A total of 6,918 gene families were identified with two or more isoforms, and 26,683 complete ORFs with an average length of 1,497?bp were detected. Totally, 1,208 high-confidence lncRNAs were identified, and most of these appeared to be precursor transcripts of miRNAs or snoRNAs. Phylogenetic tree of the Misgurnus species was inferred based on the 1,905 single copy orthologous genes. The tetraploid and diploid M. anguillicaudatus grouped into a clade, and M. bipartitus showed a closer relationship with the M. anguillicaudatus. The overall evolutionary rates of tetraploid M. anguillicaudatus were significantly higher than those of other Misgurnus species. Meanwhile, 28 positively selected genes were identified in M. anguillicaudatus clade. These positively selected genes may play critical roles in the adaptation to various habitat environments for M. anguillicaudatus. This study could facilitate further exploration of the genomic signatures of M. anguillicaudatus and provide potential insights into unveiling the evolutionary history of tetraploid loach.


September 22, 2019

Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire.

Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Next generation sequencing methods provide unique approaches to a number of immuno-based research areas including antibody discovery and engineering, disease surveillance, and host immune response to vaccines. In particular, single-molecule circular consensus sequencing permits the sequencing of antibody repertoires at previously unattainable depths of coverage and accuracy. We approached the bovine immunoglobulin G (IgG) repertoire with the objective of characterizing diversity of expressed IgG transcripts. Here we present single-molecule real-time sequencing data of expressed IgG heavy-chain repertoires of four individual cattle. We describe the diversity observed within antigen binding regions and visualize this diversity using a network-based approach.We generated 49,945 high quality cDNA sequences, each spanning the entire IgG variable region from four Bos taurus calves. From these sequences we identified 49,521 antigen binding regions using the automated Paratome web server. Approximately 9% of all unique complementarity determining 2 (CDR2) sequences were of variable lengths. A bimodal distribution of unique CDR3 sequence lengths was observed, with common lengths of 5-6 and 21-25 amino acids. The average number of cysteine residues in CDR3s increased with CDR3 length and we observed that cysteine residues were centrally located in CDR3s. We identified 19 extremely long CDR3 sequences (up to 62 amino acids in length) within IgG transcripts. Network analyses revealed distinct patterns among the expressed IgG antigen binding repertoires of the examined individuals.We utilized circular consensus sequencing technology to provide baseline data of the expressed bovine IgG repertoire that can be used for future studies important to livestock research. Somatic mutation resulting in base insertions and deletions in CDR2 further diversifies the bovine antibody repertoire. In contrast to previous studies, our data indicate that unusually long CDR3 sequences are not unique to IgM antibodies in cattle. Centrally located cysteine residues in bovine CDR3s provide further evidence that disulfide bond formation is likely of structural importance. We hypothesize that network or cluster-based analyses of expressed antibody repertoires from controlled challenge experiments will help identify novel natural antigen binding solutions to specific pathogens of interest.


September 22, 2019

A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.


September 22, 2019

Analyses of intestinal microbiota: culture versus sequencing.

Analyzing human as well as animal microbiota composition has gained growing interest because structural components and metabolites of microorganisms fundamentally influence all aspects of host physiology. Originally dominated by culture-dependent methods for exploring these ecosystems, the development of molecular techniques such as high throughput sequencing has dramatically increased our knowledge. Because many studies of the microbiota are based on the bacterial 16S ribosomal RNA (rRNA) gene targets, they can, at least in principle, be compared to determine the role of the microbiome composition for developmental processes, host metabolism, and physiology as well as different diseases. In our review, we will summarize differences and pitfalls in current experimental protocols, including all steps from nucleic acid extraction to bioinformatical analysis which may produce variation that outweighs subtle biological differences. Future developments, such as integration of metabolomic, transcriptomic, and metagenomic data sets and standardization of the procedures, will be discussed. © The Author 2015. Published by Oxford University Press on behalf of the Institute for Laboratory Animal Research. All rights reserved. For permissions, please email: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.