Menu
September 22, 2019

Transcriptional fates of human-specific segmental duplications in brain.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.© 2018 Dougherty et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

The state of play in higher eukaryote gene annotation.

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe – or ‘annotate’ – genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists – from clinicians to evolutionary biologists – need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.


September 22, 2019

Full-length transcriptome of Misgurnus anguillicaudatus provides insights into evolution of genus Misgurnus.

Reconstruction and annotation of transcripts, particularly for a species without reference genome, plays a critical role in gene discovery, investigation of genomic signatures, and genome annotation in the pre-genomic era. This study generated 33,330 full-length transcripts of diploid M. anguillicaudatus using PacBio SMRT Sequencing. A total of 6,918 gene families were identified with two or more isoforms, and 26,683 complete ORFs with an average length of 1,497?bp were detected. Totally, 1,208 high-confidence lncRNAs were identified, and most of these appeared to be precursor transcripts of miRNAs or snoRNAs. Phylogenetic tree of the Misgurnus species was inferred based on the 1,905 single copy orthologous genes. The tetraploid and diploid M. anguillicaudatus grouped into a clade, and M. bipartitus showed a closer relationship with the M. anguillicaudatus. The overall evolutionary rates of tetraploid M. anguillicaudatus were significantly higher than those of other Misgurnus species. Meanwhile, 28 positively selected genes were identified in M. anguillicaudatus clade. These positively selected genes may play critical roles in the adaptation to various habitat environments for M. anguillicaudatus. This study could facilitate further exploration of the genomic signatures of M. anguillicaudatus and provide potential insights into unveiling the evolutionary history of tetraploid loach.


September 22, 2019

A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.


September 22, 2019

Analyses of intestinal microbiota: culture versus sequencing.

Analyzing human as well as animal microbiota composition has gained growing interest because structural components and metabolites of microorganisms fundamentally influence all aspects of host physiology. Originally dominated by culture-dependent methods for exploring these ecosystems, the development of molecular techniques such as high throughput sequencing has dramatically increased our knowledge. Because many studies of the microbiota are based on the bacterial 16S ribosomal RNA (rRNA) gene targets, they can, at least in principle, be compared to determine the role of the microbiome composition for developmental processes, host metabolism, and physiology as well as different diseases. In our review, we will summarize differences and pitfalls in current experimental protocols, including all steps from nucleic acid extraction to bioinformatical analysis which may produce variation that outweighs subtle biological differences. Future developments, such as integration of metabolomic, transcriptomic, and metagenomic data sets and standardization of the procedures, will be discussed. © The Author 2015. Published by Oxford University Press on behalf of the Institute for Laboratory Animal Research. All rights reserved. For permissions, please email: journals.permissions@oup.com.


September 22, 2019

Molecular mechanisms of acclimatization to phosphorus starvation and recovery underlying full-length transcriptome profiling in barley (Hordeum vulgare L.).

A lack of phosphorus (P) in plants can severely constrain growth and development. Barley, one of the earliest domesticated crops, is extensively planted in poor soil around the world. To date, the molecular mechanisms of enduring low phosphorus, at the transcriptional level, in barley are still unclear. In the present study, two different barley genotypes (GN121 and GN42)-with contrasting phosphorus efficiency-were used to reveal adaptations to low phosphorus stress, at three time points, at the morphological, physiological, biochemical, and transcriptome level. GN121 growth was less affected by phosphorus starvation and recovery than that of GN42. The biomass and inorganic phosphorus concentration of GN121 and GN42 declined under the low phosphorus-induced stress and increased after recovery with normal phosphorus. However, the range of these parameters was higher in GN42 than in GN121. Subsequently, a more complete genome annotation was obtained by correcting with the data sequenced on Illumina HiSeq X 10 and PacBio RSII SMRT platform. A total of 6,182 and 5,270 differentially expressed genes (DEGs) were identified in GN121 and GN42, respectively. The majority of these DEGs were involved in phosphorus metabolism such as phospholipid degradation, hydrolysis of phosphoric enzymes, sucrose synthesis, phosphorylation/dephosphorylation and post-transcriptional regulation; expression of these genes was significantly different between GN121 and GN42. Specifically, six and seven DEGs were annotated as phosphorus transporters in roots and leaves, respectively. Furthermore, a putative model was constructed relying on key metabolic pathways related to phosphorus to illustrate the higher phosphorus efficiency of GN121 compared to GN42 under low phosphorus conditions. Results from this study provide a multi-transcriptome database and candidate genes for further study on phosphorus use efficiency (PUE).


September 22, 2019

SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes.

Numerous methods have been developed to analyse RNA sequencing (RNA-seq) data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. Here we present superTranscripts, a substitute for a reference genome, where each gene with multiple transcripts is represented by a single sequence. The Lace software is provided to construct superTranscripts from any set of transcripts, including de novo assemblies. We demonstrate how superTranscripts enable visualisation, variant detection and differential isoform detection in non-model organisms. We further use Lace to combine reference and assembled transcriptomes for chicken and recover hundreds of gaps in the reference genome.


September 22, 2019

Somatic APP gene recombination in Alzheimer’s disease and normal neurons.

The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro-insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and diseases of the human brain.


September 22, 2019

MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs.

There are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads. Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbiome sequencing, and there is a need to adapt short read tools to long read datasets.We describe a new LCA-based algorithm for taxonomic binning, and an interval-tree based algorithm for functional binning, that are explicitly designed for long reads and assembled contigs. We provide a new interactive tool for investigating the alignment of long reads against reference sequences. For taxonomic and functional binning, we propose to use LAST to compare long reads against the NCBI-nr protein reference database so as to obtain frame-shift aware alignments, and then to process the results using our new methods.All presented methods are implemented in the open source edition of MEGAN, and we refer to this new extension as MEGAN-LR (MEGAN long read). We evaluate the LAST+MEGAN-LR approach in a simulation study, and on a number of mock community datasets consisting of Nanopore reads, PacBio reads and assembled PacBio reads. We also illustrate the practical application on a Nanopore dataset that we sequenced from an anammox bio-rector community.This article was reviewed by Nicola Segata together with Moreno Zolfo, Pete James Lockhart and Serghei Mangul.This work extends the applicability of the widely-used metagenomic analysis software MEGAN to long reads. Our study suggests that the presented LAST+MEGAN-LR pipeline is sufficiently fast and accurate.


September 22, 2019

Fusion of TTYH1 with the C19MC microRNA cluster drives expression of a brain-specific DNMT3B isoform in the embryonal brain tumor ETMR.

Embryonal tumors with multilayered rosettes (ETMRs) are rare, deadly pediatric brain tumors characterized by high-level amplification of the microRNA cluster C19MC. We performed integrated genetic and epigenetic analyses of 12 ETMR samples and identified, in all cases, C19MC fusions to TTYH1 driving expression of the microRNAs. ETMR tumors, cell lines and xenografts showed a specific DNA methylation pattern distinct from those of other tumors and normal tissues. We detected extreme overexpression of a previously uncharacterized isoform of DNMT3B originating at an alternative promoter that is active only in the first weeks of neural tube development. Transcriptional and immunohistochemical analyses suggest that C19MC-dependent DNMT3B deregulation is mediated by RBL2, a known repressor of DNMT3B. Transfection with individual C19MC microRNAs resulted in DNMT3B upregulation and RBL2 downregulation in cultured cells. Our data suggest a potential oncogenic re-engagement of an early developmental program in ETMR via epigenetic alteration mediated by an embryonic, brain-specific DNMT3B isoform.


September 22, 2019

Ecological genomics of tropical trees: how local population size and allelic diversity of resistance genes relate to immune responses, cosusceptibility to pathogens, and negative density dependence

In tropical forests, rarer species show increased sensitivity to species-specific soil pathogens and more negative effects of conspecific density on seedling survival (NDD). These patterns suggest a connection between ecology and immunity, perhaps because small population size disproportionately reduces genetic diversity of hyperdiverse loci such as immunity genes. In an experiment examining seedling roots from six species in one tropical tree community, we found that smaller populations have reduced amino acid diversity in pathogen resistance (R) genes but not the transcriptome in general. Normalized R gene amino acid diversity varied with local abundance and prior measures of differences in sensitivity to conspecific soil and NDD. After exposure to live soil, species with lower R gene diversity had reduced defence gene induction, more cosusceptibility of maternal cohorts to colonization by potentially pathogenic fungi, reduced root growth arrest (an R gene-mediated response) and their root-associated fungi showed lower induction of self-defence (antioxidants). Local abundance was not related to the ability to induce immune responses when pathogen recognition was bypassed by application of salicylic acid, a phytohormone that activates defence responses downstream of R gene signalling. These initial results support the hypothesis that smaller local tree populations have reduced R gene diversity and recognition-dependent immune responses, along with greater cosusceptibility to species-specific pathogens that may facilitate disease transmission and NDD. Locally rare species may be less able to increase their equilibrium abundance without genetic boosts to defence via immigration of novel R gene alleles from a larger and more diverse regional population.


September 22, 2019

Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina-and SMRT-based RNA-seq datasets

The genome of the wild diploid strawberry species Fragaria vesca, an ideal model system of cultivated strawberry (Fragaria × ananassa, octoploid) and other Rosaceae family crops, was first published in 2011 and followed by a new assembly (Fvb). However, the annotation for Fvb mainly relied on ab initio predictions and included only predicted coding sequences, therefore an improved annotation is highly desirable. Here, a new annotation version named v2.0.a2 was created for the Fvb genome by a pipeline utilizing one PacBio library, 90 Illumina RNA-seq libraries, and 9 small RNA-seq libraries. Altogether, 18,641 genes (55.6% out of 33,538 genes) were augmented with information on the 5′ and/or 3′ UTRs, 13,168 (39.3%) protein-coding genes were modified or newly identified, and 7,370 genes were found to possess alternative isoforms. In addition, 1,938 long non-coding RNAs, 171 miRNAs, and 51,714 small RNA clusters were integrated into the annotation. This new annotation of F. vesca is substantially improved in both accuracy and integrity of gene predictions, beneficial to the gene functional studies in strawberry and to the comparative genomic analysis of other horticultural crops in Rosaceae family.


September 22, 2019

Transcriptome analysis of distinct cold tolerance strategies in the rubber tree (Hevea brasiliensis)

Natural rubber is an indispensable commodity used in approximately 40,000 products and is fundamental to the tire industry. Among the species that produce latex, the rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss.) Muell-Arg.], a species native to the Amazon rainforest, is the major producer of latex used worldwide. The Amazon Basin presents optimal conditions for rubber tree growth, but the occurrence of South American leaf blight, which is caused by the fungus Microcyclus ulei (P. Henn) v. Arx, limits rubber tree production. Currently, rubber tree plantations are located in scape regions that exhibit suboptimal conditions such as high winds and cold temperatures. Rubber tree breeding programs aim to identify clones that are adapted to these stress conditions. However, rubber tree breeding is time-consuming, taking more than 20 years to develop a new variety. It is also expensive and requires large field areas. Thus, genetic studies could optimize field evaluations, thereby reducing the time and area required for these experiments. Transcriptome sequencing using next-generation sequencing (RNA-seq) is a powerful tool to identify a full set of transcripts and for evaluating gene expression in model and non-model species. In this study, we constructed a comprehensive transcriptome to evaluate the cold response strategies of the RRIM600 (cold-resistant) and GT1 (cold-tolerant) genotypes. Furthermore, we identified putative microsatellite (SSR) and single-nucleotide polymorphism (SNP) markers. Alternative splicing, which is an important mechanism for plant adaptation under abiotic stress, was further identified, providing an important database for further studies of cold tolerance.


September 22, 2019

The first whole transcriptomic exploration of pre-oviposited early chicken embryos using single and bulked embryonic RNA-sequencing.

The chicken is a valuable model organism, especially in evolutionary and embryology research because its embryonic development occurs in the egg. However, despite its scientific importance, no transcriptome data have been generated for deciphering the early developmental stages of the chicken because of practical and technical constraints in accessing pre-oviposited embryos.Here, we determine the entire transcriptome of pre-oviposited avian embryos, including oocyte, zygote, and intrauterine embryos from Eyal-giladi and Kochav stage I (EGK.I) to EGK.X collected using a noninvasive approach for the first time. We also compare RNA-sequencing data obtained using a bulked embryo sequencing and single embryo/cell sequencing technique. The raw sequencing data were preprocessed with two genome builds, Galgal4 and Galgal5, and the expression of 17,108 and 26,102 genes was quantified in the respective builds. There were some differences between the two techniques, as well as between the two genome builds, and these were affected by the emergence of long intergenic noncoding RNA annotations.The first transcriptome datasets of pre-oviposited early chicken embryos based on bulked and single embryo sequencing techniques will serve as a valuable resource for investigating early avian embryogenesis, for comparative studies among vertebrates, and for novel gene annotation in the chicken genome.


September 22, 2019

Circular RNA architecture and differentiation during leaf bud to young leaf development in tea (Camellia sinensis).

Circular RNA (circRNA) discovery, expression patterns and experimental validation in developing tea leaves indicates its correlation with circRNA-parental genes and potential roles in ceRNA interaction network. Circular RNAs (circRNAs) have recently emerged as a novel class of abundant endogenous stable RNAs produced by circularization with regulatory potential. However, identification of circRNAs in plants, especially in non-model plants with large genomes, is challenging. In this study, we undertook a systematic identification of circRNAs from different stage tissues of tea plant (Camellia sinensis) leaf development using rRNA-depleted circular RNA-seq. By combining two state-of-the-art detecting tools, we characterized 3174 circRNAs, of which 342 were shared by each approach, and thus considered high-confidence circRNAs. A few predicted circRNAs were randomly chosen, and 20 out of 24 were experimental confirmed by PCR and Sanger sequencing. Similar in other plants, tissue-specific expression was also observed for many C. sinensis circRNAs. In addition, we found that circRNA abundances were positively correlated with the mRNA transcript abundances of their parental genes. qRT-PCR validated the differential expression patterns of circRNAs between leaf bud and young leaf, which also indicated the low expression abundance of circRNAs compared to the standard mRNAs from the parental genes. We predicted the circRNA-microRNA interaction networks, and 54 of the differentially expressed circRNAs were found to have potential tea plant miRNA binding sites. The gene sets encoding circRNAs were significantly enriched in chloroplasts related GO terms and photosynthesis/metabolites biosynthesis related KEGG pathways, suggesting the candidate roles of circRNAs in photosynthetic machinery and metabolites biosynthesis during leaf development.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.