N6-methyladenosine (m6A) is a widespread RNA modification that influences nearly every aspect of the messenger RNA lifecycle. Our understanding of m6A has been facilitated by the development of global m6A mapping methods, which use antibodies to immunoprecipitate methylated RNA. However, these methods have several limitations, including high input RNA requirements and cross-reactivity to other RNA modifications. Here, we present DART-seq (deamination adjacent to RNA modification targets), an antibody-free method for detecting m6A sites. In DART-seq, the cytidine deaminase APOBEC1 is fused to the m6A-binding YTH domain. APOBEC1-YTH expression in cells induces C-to-U deamination at sites adjacent to m6A residues, which are detected using standard RNA-seq. DART-seq identifies thousands of m6A sites in cells from as little as 10?ng of total RNA and can detect m6A accumulation in cells over time. Additionally, we use long-read DART-seq to gain insights into m6A distribution along the length of individual transcripts.
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
A comparison of immunoglobulin IGHV, IGHD and IGHJ genes in wild-derived and classical inbred mouse strains.
The genomes of classical inbred mouse strains include genes derived from all three major subspecies of the house mouse, Mus musculus. We recently posited that genetic diversity in the immunoglobulin heavy chain (IGH) gene loci of C57BL/6 and BALB/c mice reflect differences in subspecies origin. To investigate this hypothesis, we conducted high-throughput sequencing of IGH gene rearrangements to document IGH variable (IGHV), joining (IGHJ), and diversity (IGHD) genes in four inbred wild-derived mouse strains (CAST/EiJ, LEWES/EiJ, MSM/MsJ, and PWD/PhJ), and a single disease model strain (NOD/ShiLtJ), collectively representing genetic backgrounds of several major mouse subspecies. A total of 341 germline IGHV sequences were inferred in the wild-derived strains, including 247 not curated in the International Immunogenetics Information System. In contrast, 83/84 inferred NOD IGHV genes had previously been observed in C57BL/6 mice. Variability among the strains examined was observed for only a single IGHJ gene, involving a description of a novel allele. In contrast, unexpected variation was found in the IGHD gene loci, with four previously unreported IGHD gene sequences being documented. Very few IGHV sequences of C57BL/6 and BALB/c mice were shared with strains representing major subspecies, suggesting that their IGH loci may be complex mosaics of genes of disparate origins. This suggests a similar level of diversity is likely present in the IGH loci of other classical inbred strains. This must now be documented if we are to properly understand inter-strain variation in models of antibody-mediated disease. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.
Cultured Epidermal Autografts from Clinically Revertant Skin as a Potential Wound Treatment for Recessive Dystrophic Epidermolysis Bullosa.
Inherited skin disorders have been reported recently to have sporadic normal-looking areas, where a portion of the keratinocytes have recovered from causative gene mutations (revertant mosaicism). We observed a case of recessive dystrophic epidermolysis bullosa treated with cultured epidermal autografts (CEAs), whose CEA-grafted site remained epithelized for 16 years. We proved that the CEA product and the grafted area included cells with revertant mosaicism. Based on these findings, we conducted an investigator-initiated clinical trial of CEAs from clinically revertant skin for recessive dystrophic epidermolysis bullosa. The donor sites were analyzed by genetic analysis, immunofluorescence, electron microscopy, and quantification of the reverted mRNA with deep sequencing. The primary endpoint was the ulcer epithelization rate per patient at 4 weeks after the last CEA application. Three patients with recessive dystrophic epidermolysis bullosa with 8 ulcers were enrolled, and the epithelization rate for each patient at the primary endpoint was 87.7%, 100%, and 57.0%, respectively. The clinical effects were found to persist for at least 76 weeks after CEA transplantation. One of the three patients had apparent revertant mosaicism in the donor skin and in the post-transplanted area. CEAs from clinically normal skin are a potentially well-tolerated treatment for recessive dystrophic epidermolysis bullosa.Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.
We present high quality, phased genome assemblies representative of taurine and indicine cattle, subspecies that differ markedly in productivity-related traits and environmental adaptation. We report a new haplotype-aware scaffolding and polishing pipeline using contigs generated by the trio binning method to produce haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle breeds. These assemblies were used to identify structural and copy number variants that differentiate the subspecies and we found variant detection was sensitive to the specific reference genome chosen. Six gene families with immune related functions are expanded in the indicine lineage. Assembly of the genomes of both subspecies from a single individual enabled transcripts to be phased to detect allele-specific expression, and to study genome-wide selective sweeps. An indicus-specific extra copy of fatty acid desaturase is under positive selection and may contribute to indicine adaptation to heat and drought.
We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and one of Glycine soja, the closest wild relative of G. max. The G. max assemblies are for widely used U.S. cultivars: the northern line ‘Williams 82’ (Wm82); and the southern line ‘Lee’. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 SNPs/kb between Wm82 and Lee, and 4.7 SNPs/kb between these lines and G. soja. SNP distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgressions and haplotype structure. Comparisons against the U.S. germplasm collection shows placement of the sequenced accessions relative to global soybean diversity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found ~40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and ~32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for soybean’s domestication and improvement, serving as a basis for future research and crop improvement efforts for this important crop species. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.