N6-methyladenosine (m6A) is a widespread RNA modification that influences nearly every aspect of the messenger RNA lifecycle. Our understanding of m6A has been facilitated by the development of global m6A mapping methods, which use antibodies to immunoprecipitate methylated RNA. However, these methods have several limitations, including high input RNA requirements and cross-reactivity to other RNA modifications. Here, we present DART-seq (deamination adjacent to RNA modification targets), an antibody-free method for detecting m6A sites. In DART-seq, the cytidine deaminase APOBEC1 is fused to the m6A-binding YTH domain. APOBEC1-YTH expression in cells induces C-to-U deamination at sites adjacent to m6A residues, which are detected using standard RNA-seq. DART-seq identifies thousands of m6A sites in cells from as little as 10?ng of total RNA and can detect m6A accumulation in cells over time. Additionally, we use long-read DART-seq to gain insights into m6A distribution along the length of individual transcripts.
A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates.Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs.The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers. © The Author(s) 2019. Published by Oxford University Press.
Variation in DNA methylation patterns among genes, individuals, and populations appears to be highly variable among taxa, but our understanding of the functional significance of this variation is still incomplete. We here present the first whole genome bisulfite sequencing of a chelicerate species, the social spider Stegodyphus dumicola. We show that DNA methylation occurs mainly in CpG context and is concentrated in genes. This is a pattern also documented in other invertebrates. We present RNA sequence data to investigate the role of DNA methylation in gene regulation and show that, within individuals, methylated genes are more expressed than genes that are not methylated and that methylated genes are more stably expressed across individuals than unmethylated genes. Although no causal association is shown, this lends support for the implication of DNA CpG methylation in regulating gene expression in invertebrates. Differential DNA methylation between populations showed a small but significant correlation with differential gene expression. This is consistent with a possible role of DNA methylation in local adaptation. Based on indirect inference of the presence and pattern of DNA methylation in chelicerate species whose genomes have been sequenced, we performed a comparative phylogenetic analysis. We found strong evidence for exon DNA methylation in the horseshoe crab Limulus polyphemus and in all spider and scorpion species, while most Parasitiformes and Acariformes species seem to have lost DNA methylation.
Recombination between loci underlying mate choice and ecological traits is a major evolutionary force acting against speciation with gene flow. The evolution of linkage disequilibrium between such loci is therefore a fundamental step in the origin of species. Here, we show that this process can take place in the absence of physical linkage in hamlets-a group of closely related reef fishes from the wider Caribbean that differ essentially in colour pattern and are reproductively isolated through strong visually-based assortative mating. Using full-genome analysis, we identify four narrow genomic intervals that are consistently differentiated among sympatric species in a backdrop of extremely low genomic divergence. These four intervals include genes involved in pigmentation (sox10), axial patterning (hoxc13a), photoreceptor development (casz1) and visual sensitivity (SWS and LWS opsins) that develop islands of long-distance and inter-chromosomal linkage disequilibrium as species diverge. The relatively simple genomic architecture of species differences facilitates the evolution of linkage disequilibrium in the presence of gene flow.
The antibody repertoire of Bos taurus is characterized by a subset of variable heavy (VH) chain regions with ultralong third complementarity determining regions (CDR3) which, compared to other species, can provide a potent response to challenging antigens like HIV env. These unusual CDR3 can range to over seventy highly diverse amino acids in length and form unique ß-ribbon ‘stalk’ and disulfide bonded ‘knob’ structures, far from the typical antigen binding site. The genetic components and processes for forming these unusual cattle antibody VH CDR3 are not well understood. Here we analyze sequences of Bos taurus antibody VH domains and find that the subset with ultralong CDR3 exclusively uses a single variable gene, IGHV1-7 (VHBUL) rearranged to the longest diversity gene, IGHD8-2. An eight nucleotide duplication at the 3′ end of IGHV1-7 encodes a longer V-region producing an extended F ß-strand that contributes to the stalk in a rearranged CDR3. A low amino acid variability was observed in CDR1 and CDR2, suggesting that antigen binding for this subset most likely only depends on the CDR3. Importantly a novel, potentially AID mediated, deletional diversification mechanism of the B. taurus VH ultralong CDR3 knob was discovered, in which interior codons of the IGHD8-2 region are removed while maintaining integral structural components of the knob and descending strand of the stalk in place. These deletions serve to further diversify cysteine positions, and thus disulfide bonded loops. Hence, both germline and somatic genetic factors and processes appear to be involved in diversification of this structurally unusual cattle VH ultralong CDR3 repertoire.
Contrasting Roles of Transcription Factors Spineless and EcR in the Highly Dynamic Chromatin Landscape of Butterfly Wing Metamorphosis.
Development requires highly coordinated changes in chromatin accessibility in order for proper gene regulation to occur. Here, we identify factors associated with major, discrete changes in chromatin accessibility during butterfly wing metamorphosis. By combining mRNA sequencing (mRNA-seq), assay for transposase-accessible chromatin using sequencing (ATAC-seq), and machine learning analysis of motifs, we show that distinct sets of transcription factors are predictive of chromatin opening at different developmental stages. Our data suggest an important role for nuclear hormone receptors early in metamorphosis, whereas PAS-domain transcription factors are strongly associated with later chromatin opening. Chromatin immunoprecipitation sequencing (ChIP-seq) validation of select candidate factors showed spineless binding to be a major predictor of opening chromatin. Surprisingly, binding of ecdysone receptor (EcR), a candidate accessibility factor in Drosophila, was not predictive of opening but instead marked persistent sites. This work characterizes the chromatin dynamics of insect wing metamorphosis, identifies candidate chromatin remodeling factors in insects, and presents a genome assembly of the model butterfly Junonia coenia.Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.
Resource Concentration Modulates the Fate of Dissimilated Nitrogen in a Dual-Pathway Actinobacterium.
Respiratory ammonification and denitrification are two evolutionarily unrelated dissimilatory nitrogen (N) processes central to the global N cycle, the activity of which is thought to be controlled by carbon (C) to nitrate (NO3-) ratio. Here we find that Intrasporangium calvum C5, a novel dual-pathway denitrifier/respiratory ammonifier, disproportionately utilizes ammonification rather than denitrification when grown under low C concentrations, even at low C:NO3- ratios. This finding is in conflict with the paradigm that high C:NO3- ratios promote ammonification and low C:NO3- ratios promote denitrification. We find that the protein atomic composition for denitrification modules (NirK) are significantly cost minimized for C and N compared to ammonification modules (NrfA), indicating that limitation for C and N is a major evolutionary selective pressure imprinted in the architecture of these proteins. The evolutionary precedent for these findings suggests ecological importance for microbial activity as evidenced by higher growth rates when I. calvum grows predominantly using its ammonification pathway and by assimilating its end-product (ammonium) for growth under ammonium-free conditions. Genomic analysis of I. calvum further reveals a versatile ecophysiology to cope with nutrient stress and redox conditions. Metabolite and transcriptional profiles during growth indicate that enzyme modules, NrfAH and NirK, are not constitutively expressed but rather induced by nitrite production via NarG. Mechanistically, our results suggest that pathway selection is driven by intracellular redox potential (redox poise), which may be lowered when resource concentrations are low, thereby decreasing catalytic activity of upstream electron transport steps (i.e., the bc1 complex) needed for denitrification enzymes. Our work advances our understanding of the biogeochemical flexibility of N-cycling organisms, pathway evolution, and ecological food-webs.
Is there foul play in the leaf pocket? The metagenome of floating fern Azolla reveals endophytes that do not fix N2 but may denitrify.
Dinitrogen fixation by Nostoc azollae residing in specialized leaf pockets supports prolific growth of the floating fern Azolla filiculoides. To evaluate contributions by further microorganisms, the A. filiculoides microbiome and nitrogen metabolism in bacteria persistently associated with Azolla ferns were characterized. A metagenomic approach was taken complemented by detection of N2 O released and nitrogen isotope determinations of fern biomass. Ribosomal RNA genes in sequenced DNA of natural ferns, their enriched leaf pockets and water filtrate from the surrounding ditch established that bacteria of A. filiculoides differed entirely from surrounding water and revealed species of the order Rhizobiales. Analyses of seven cultivated Azolla species confirmed persistent association with Rhizobiales. Two distinct nearly full-length Rhizobiales genomes were identified in leaf-pocket-enriched samples from ditch grown A. filiculoides. Their annotation revealed genes for denitrification but not N2 -fixation. 15 N2 incorporation was active in ferns with N. azollae but not in ferns without. N2 O was not detectably released from surface-sterilized ferns with the Rhizobiales. N2 -fixing N. azollae, we conclude, dominated the microbiome of Azolla ferns. The persistent but less abundant heterotrophic Rhizobiales bacteria possibly contributed to lowering O2 levels in leaf pockets but did not release detectable amounts of the strong greenhouse gas N2 O.© 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
The Epstein-Barr virus (EBV) miR-BHRF1 microRNA (miRNA) cluster has been shown to facilitate B-cell transformation and promote the rapid growth of the resultant lymphoblastoid cell lines (LCLs). However, we find that expression of physiological levels of the miR-BHRF1 miRNAs in LCLs transformed with a miR-BHRF1 null mutant (?123) fails to increase their growth rate. We demonstrate that the pri-miR-BHRF1-2 and 1-3 stem-loops are present in the 3’UTR of transcripts encoding EBNA-LP and that excision of pre-miR-BHRF1-2 and 1-3 by Drosha destabilizes these mRNAs and reduces expression of the encoded protein. Therefore, mutational inactivation of pri-miR-BHRF1-2 and 1-3 in the ?123 mutant upregulates the expression of not only EBNA-LP but also EBNA-LP-regulated mRNAs and proteins, including LMP1. We hypothesize that this overexpression causes the reduced transformation capacity of the ?123 EBV mutant. Thus, in addition to regulating cellular mRNAs in trans, miR-BHRF1-2 and 1-3 also regulate EBNA-LP mRNA expression in cis. Copyright © 2017 Elsevier Inc. All rights reserved.
Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.
Candidatus Dactylopiibacterium carminicum, a nitrogen-fixing symbiont of Dactylopius cochineal insects (Hemiptera: Coccoidea: Dactylopiidae)
The domesticated carmine cochineal Dactylopius coccus (scale insect) has commercial value and has been used for more than 500?years for natural red pigment production. Besides the domesticated cochineal, other wild Dactylopius species such as Dactylopius opuntiae are found in the Americas, all feeding on nutrient poor sap from native cacti. To compensate nutritional deficiencies, many insects harbor symbiotic bacteria which provide essential amino acids or vitamins to their hosts. Here, we characterized a symbiont from the carmine cochineal insects, Candidatus Dactylopiibacterium carminicum (betaproteobacterium, Rhodocyclaceae family) and found it in D. coccus and in D. opuntiae ovaries by fluorescent in situ hybridization, suggesting maternal inheritance. Bacterial genomes recovered from metagenomic data derived from whole insects or tissues both from D. coccus and from D. opuntiae were around 3.6?Mb in size. Phylogenomics showed that dactylopiibacteria constituted a closely related clade neighbor to nitrogen fixing bacteria from soil or from various plants including rice and other grass endophytes. Metabolic capabilities were inferred from genomic analyses, showing a complete operon for nitrogen fixation, biosynthesis of amino acids and vitamins and putative traits of anaerobic or microoxic metabolism as well as genes for plant interaction. Dactylopiibacterium nif gene expression and acetylene reduction activity detecting nitrogen fixation were evidenced in D. coccus hemolymph and ovaries, in congruence with the endosymbiont fluorescent in situ hybridization location. Dactylopiibacterium symbionts may compensate for the nitrogen deficiency in the cochineal diet. In addition, this symbiont may provide essential amino acids, recycle uric acid, and increase the cochineal life span.
Next-generation approaches to advancing eco-immunogenomic research in critically endangered primates.
High-throughput sequencing platforms are generating massive amounts of genomic data from nonmodel species, and these data sets are valuable resources that can be mined to advance a number of research areas. An example is the growing amount of transcriptome data that allow for examination of gene expression in nonmodel species. Here, we show how publicly available transcriptome data from nonmodel primates can be used to design novel research focused on immunogenomics. We mined transcriptome data from the world’s most endangered group of primates, the lemurs of Madagascar, for sequences corresponding to immunoglobulins. Our results confirmed homology between strepsirrhine and haplorrhine primate immunoglobulins and allowed for high-throughput sequencing of expressed antibodies (Ig-seq) in Coquerel’s sifaka (Propithecus coquereli). Using both Pacific Biosciences RS and Ion Torrent PGM sequencing, we performed Ig-seq on two individuals of Coquerel’s sifaka. We generated over 150 000 sequences of expressed antibodies, allowing for molecular characterization of the antigen-binding region. Our analyses suggest that similar VDJ expression patterns exist across all primates, with sequences closely related to the human VH 3 immunoglobulin family being heavily represented in sifaka antibodies. Moreover, the antigen-binding region of sifaka antibodies exhibited similar amino acid variation with respect to haplorrhine primates. Our study represents the first attempt to characterize sequence diversity of the expressed antibody repertoire in a species of lemur. We anticipate that methods similar to ours will provide the framework for investigating the adaptive immune response in wild populations of other nonmodel organisms and can be used to advance the burgeoning field of eco-immunology. © 2014 John Wiley & Sons Ltd.
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Next generation multilocus sequence typing (NGMLST) and the analytical software program MLSTEZ enable efficient, cost-effective, high-throughput, multilocus sequencing typing.
Multilocus sequence typing (MLST) has become the preferred method for genotyping many biological species, and it is especially useful for analyzing haploid eukaryotes. MLST is rigorous, reproducible, and informative, and MLST genotyping has been shown to identify major phylogenetic clades, molecular groups, or subpopulations of a species, as well as individual strains or clones. MLST molecular types often correlate with important phenotypes. Conventional MLST involves the extraction of genomic DNA and the amplification by PCR of several conserved, unlinked gene sequences from a sample of isolates of the taxon under investigation. In some cases, as few as three loci are sufficient to yield definitive results. The amplicons are sequenced, aligned, and compared by phylogenetic methods to distinguish statistically significant differences among individuals and clades. Although MLST is simpler, faster, and less expensive than whole genome sequencing, it is more costly and time-consuming than less reliable genotyping methods (e.g. amplified fragment length polymorphisms). Here, we describe a new MLST method that uses next-generation sequencing, a multiplexing protocol, and appropriate analytical software to provide accurate, rapid, and economical MLST genotyping of 96 or more isolates in single assay. We demonstrate this methodology by genotyping isolates of the well-characterized, human pathogenic yeast Cryptococcus neoformans. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
High-resolution expression map of the Arabidopsis root reveals alternative splicing and lincRNA regulation.
The extent to which alternative splicing and long intergenic noncoding RNAs (lincRNAs) contribute to the specialized functions of cells within an organ is poorly understood. We generated a comprehensive dataset of gene expression from individual cell types of the Arabidopsis root. Comparisons across cell types revealed that alternative splicing tends to remove parts of coding regions from a longer, major isoform, providing evidence for a progressive mechanism of splicing. Cell-type-specific intron retention suggested a possible origin for this common form of alternative splicing. Coordinated alternative splicing across developmental stages pointed to a role in regulating differentiation. Consistent with this hypothesis, distinct isoforms of a transcription factor were shown to control developmental transitions. lincRNAs were generally lowly expressed at the level of individual cell types, but co-expression clusters provided clues as to their function. Our results highlight insights gained from analysis of expression at the level of individual cell types. Copyright © 2016 Elsevier Inc. All rights reserved.