The accurate and comprehensive identification of functional regulatory sequences in mammalian genomes remains a major challenge. Here we describe site-specific integration fluorescence-activated cell sorting followed by sequencing (SIF-seq), an unbiased, medium-throughput functional assay for the discovery of distant-acting enhancers. Targeted single-copy genomic integration into pluripotent cells, reporter assays and flow cytometry are coupled with high-throughput DNA sequencing to enable parallel screening of large numbers of DNA sequences. By functionally interrogating >500 kilobases (kb) of mouse and human sequence in mouse embryonic stem cells for enhancer activity we identified enhancers at pluripotency loci including NANOG. In in vitro-differentiated cardiomyocytes and neural progenitor cells, we identified cardiac enhancers and neuronal enhancers, respectively. SIF-seq is a powerful and flexible method for de novo functional identification of mammalian enhancers in a potentially wide variety of cell types.
Fusion of TTYH1 with the C19MC microRNA cluster drives expression of a brain-specific DNMT3B isoform in the embryonal brain tumor ETMR.
Embryonal tumors with multilayered rosettes (ETMRs) are rare, deadly pediatric brain tumors characterized by high-level amplification of the microRNA cluster C19MC. We performed integrated genetic and epigenetic analyses of 12 ETMR samples and identified, in all cases, C19MC fusions to TTYH1 driving expression of the microRNAs. ETMR tumors, cell lines and xenografts showed a specific DNA methylation pattern distinct from those of other tumors and normal tissues. We detected extreme overexpression of a previously uncharacterized isoform of DNMT3B originating at an alternative promoter that is active only in the first weeks of neural tube development. Transcriptional and immunohistochemical analyses suggest that C19MC-dependent DNMT3B deregulation is mediated by RBL2, a known repressor of DNMT3B. Transfection with individual C19MC microRNAs resulted in DNMT3B upregulation and RBL2 downregulation in cultured cells. Our data suggest a potential oncogenic re-engagement of an early developmental program in ETMR via epigenetic alteration mediated by an embryonic, brain-specific DNMT3B isoform.
The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.
The architecture of a scrambled genome reveals massive levels of genomic rearrangement during development.
Programmed DNA rearrangements in the single-celled eukaryote Oxytricha trifallax completely rewire its germline into a somatic nucleus during development. This elaborate, RNA-mediated pathway eliminates noncoding DNA sequences that interrupt gene loci and reorganizes the remaining fragments by inversions and permutations to produce functional genes. Here, we report the Oxytricha germline genome and compare it to the somatic genome to present a global view of its massive scale of genome rearrangements. The remarkably encrypted genome architecture contains >3,500 scrambled genes, as well as >800 predicted germline-limited genes expressed, and some posttranslationally modified, during genome rearrangements. Gene segments for different somatic loci often interweave with each other. Single gene segments can contribute to multiple, distinct somatic loci. Terminal precursor segments from neighboring somatic loci map extremely close to each other, often overlapping. This genome assembly provides a draft of a scrambled genome and a powerful model for studies of genome rearrangement. Copyright © 2014 Elsevier Inc. All rights reserved.
It has been widely accepted that 5-methylcytosine is the only form of DNA methylation in mammalian genomes. Here we identify N(6)-methyladenine as another form of DNA modification in mouse embryonic stem cells. Alkbh1 encodes a demethylase for N(6)-methyladenine. An increase of N(6)-methyladenine levels in Alkbh1-deficient cells leads to transcriptional silencing. N(6)-methyladenine deposition is inversely correlated with the evolutionary age of LINE-1 transposons; its deposition is strongly enriched at young (<1.5 million years old) but not old (>6 million years old) L1 elements. The deposition of N(6)-methyladenine correlates with epigenetic silencing of such LINE-1 transposons, together with their neighbouring enhancers and genes, thereby resisting the gene activation signals during embryonic stem cell differentiation. As young full-length LINE-1 transposons are strongly enriched on the X chromosome, genes located on the X chromosome are also silenced. Thus, N(6)-methyladenine developed a new role in epigenetic silencing in mammalian evolution distinct from its role in gene activation in other organisms. Our results demonstrate that N(6)-methyladenine constitutes a crucial component of the epigenetic regulation repertoire in mammalian genomes.
The Lingula genome provides insights into brachiopod evolution and the origin of phosphate biomineralization.
The evolutionary origins of lingulid brachiopods and their calcium phosphate shells have been obscure. Here we decode the 425-Mb genome of Lingula anatina to gain insights into brachiopod evolution. Comprehensive phylogenomic analyses place Lingula close to molluscs, but distant from annelids. The Lingula gene number has increased to ~34,000 by extensive expansion of gene families. Although Lingula and vertebrates have superficially similar hard tissue components, our genomic, transcriptomic and proteomic analyses show that Lingula lacks genes involved in bone formation, indicating an independent origin of their phosphate biominerals. Several genes involved in Lingula shell formation are shared by molluscs. However, Lingula has independently undergone domain combinations to produce shell matrix collagens with EGF domains and carries lineage-specific shell matrix proteins. Gene family expansion, domain shuffling and co-option of genes appear to be the genomic background of Lingula’s unique biomineralization. This Lingula genome provides resources for further studies of lophotrochozoan evolution.
Comparative genome sequencing reveals genomic signature of extreme desiccation tolerance in the anhydrobiotic midge.
Anhydrobiosis represents an extreme example of tolerance adaptation to water loss, where an organism can survive in an ametabolic state until water returns. Here we report the first comparative analysis examining the genomic background of extreme desiccation tolerance, which is exclusively found in larvae of the only anhydrobiotic insect, Polypedilum vanderplanki. We compare the genomes of P. vanderplanki and a congeneric desiccation-sensitive midge P. nubifer. We determine that the genome of the anhydrobiotic species specifically contains clusters of multi-copy genes with products that act as molecular shields. In addition, the genome possesses several groups of genes with high similarity to known protective proteins. However, these genes are located in distinct paralogous clusters in the genome apart from the classical orthologues of the corresponding genes shared by both chironomids and other insects. The transcripts of these clustered paralogues contribute to a large majority of the mRNA pool in the desiccating larvae and most likely define successful anhydrobiosis. Comparison of expression patterns of orthologues between two chironomid species provides evidence for the existence of desiccation-specific gene expression systems in P. vanderplanki.
Extracellular factors belonging to the TGF-ß family play pivotal roles in the formation and patterning of germ layers during early Xenopus embryogenesis. Here, we show that the vg1 and nodal3 genes of Xenopus laevis are present in gene clusters on chromosomes XLA1L and XLA3L, respectively, and that both gene clusters have been completely lost from the syntenic S chromosome regions. The presence of gene clusters and chromosome-specific gene loss were confirmed by cDNA FISH analyses. Sequence and expression analyses revealed that paralogous genes in the vg1 and nodal3 clusters on the L chromosomes were also altered compared to their Xenopus tropicalis orthologs. X. laevis vg1 and nodal3 paralogs have potentially become pseudogenes or sub-functionalized genes and are expressed at different levels. As X. tropicalis has a single vg1 gene on chromosome XTR1, the ancestral vg1 gene in X. laevis appears to have been expanded on XLA1L. Of note, two reported vg1 genes, vg1(S20) and vg1(P20), reside in the cluster on XLA1L. The nodal3 gene cluster is also present on X. tropicalis chromosome XTR3, but phylogenetic analysis indicates that nodal3 genes in X. laevis and X. tropicalis were independently expanded and/or evolved in concert within each cluster by gene conversion. These findings provide insights into the function and molecular evolution of TGF-ß family genes in response to allotetraploidization. Copyright © 2016 Elsevier Inc. All rights reserved.
Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation.
Variation in the presence or absence of transposable elements (TEs) is a major source of genetic variation between individuals. Here, we identified 23,095 TE presence/absence variants between 216 Arabidopsis accessions. Most TE variants were rare, and we find these rare variants associated with local extremes of gene expression and DNA methylation levels within the population. Of the common alleles identified, two thirds were not in linkage disequilibrium with nearby SNPs, implicating these variants as a source of novel genetic diversity. Many common TE variants were associated with significantly altered expression of nearby genes, and a major fraction of inter-accession DNA methylation differences were associated with nearby TE insertions. Overall, this demonstrates that TE variants are a rich source of genetic diversity that likely plays an important role in facilitating epigenomic and transcriptional differences between individuals, and indicates a strong genetic basis for epigenetic variation.
Vascular and haematopoietic cells organize into specialized tissues during early embryogenesis to supply essential nutrients to all organs and thus play critical roles in development and disease. At the top of the haemato-vascular specification cascade lies cloche, a gene that when mutated in zebrafish leads to the striking phenotype of loss of most endothelial and haematopoietic cells and a significant increase in cardiomyocyte numbers. Although this mutant has been analysed extensively to investigate mesoderm diversification and differentiation and continues to be broadly used as a unique avascular model, the isolation of the cloche gene has been challenging due to its telomeric location. Here we used a deletion allele of cloche to identify several new cloche candidate genes within this genomic region, and systematically genome-edited each candidate. Through this comprehensive interrogation, we succeeded in isolating the cloche gene and discovered that it encodes a PAS-domain-containing bHLH transcription factor, and that it is expressed in a highly specific spatiotemporal pattern starting during late gastrulation. Gain-of-function experiments show that it can potently induce endothelial gene expression. Epistasis experiments reveal that it functions upstream of etv2 and tal1, the earliest expressed endothelial and haematopoietic transcription factor genes identified to date. A mammalian cloche orthologue can also rescue blood vessel formation in zebrafish cloche mutants, indicating a highly conserved role in vertebrate vasculogenesis and haematopoiesis. The identification of this master regulator of endothelial and haematopoietic fate enhances our understanding of early mesoderm diversification and may lead to improved protocols for the generation of endothelial and haematopoietic cells in vivo and in vitro.