Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.
Genome-wide association studies (GWAS) have identified many genomic loci associated with risk for schizophrenia, but unambiguous identification of the relationship between disease-associated variants and specific genes, and in particular their effect on risk conferring transcripts, has proven difficult. To better understand the specific molecular mechanism(s) at the schizophrenia locus in 11q25, we undertook cis expression quantitative trait loci (cis-eQTL) mapping for this 2 megabase genomic region using postmortem human brain samples. To comprehensively assess the effects of genetic risk upon local expression, we evaluated multiple transcript features: genes, exons, and exon-exon junctions in multiple brain regions-dorsolateral prefrontal cortex (DLPFC), hippocampus, and caudate. Genetic risk variants strongly associated with expression of SNX19 transcript features that tag multiple rare classes of SNX19 transcripts, whereas they only weakly affected expression of an exon-exon junction that tags the majority of abundant transcripts. The most prominent class of SNX19 risk-associated transcripts is predicted to be overexpressed, defined by an exon-exon splice junction between exons 8 and 10 (junc8.10) and that is predicted to encode proteins that lack the characteristic nexin C terminal domain. Risk alleles were also associated with either increased or decreased expression of multiple additional classes of transcripts. With RACE, molecular cloning, and long read sequencing, we found a number of novel SNX19 transcripts that further define the set of potential etiological transcripts. We explored epigenetic regulation of SNX19 expression and found that DNA methylation at CpG sites near the primary transcription start site and within exon 2 partially mediate the effects of risk variants on risk-associated expression. ATAC sequencing revealed that some of the most strongly risk-associated SNPs are located within a region of open chromatin, suggesting a nearby regulatory element is involved. These findings indicate a potentially complex molecular etiology, in which risk alleles for schizophrenia generate epigenetic alterations and dysregulation of multiple classes of SNX19 transcripts.
Rapid antigen diversification through mitotic recombination in the human malaria parasite Plasmodium falciparum.
Malaria parasites possess the remarkable ability to maintain chronic infections that fail to elicit a protective immune response, characteristics that have stymied vaccine development and cause people living in endemic regions to remain at risk of malaria despite previous exposure to the disease. These traits stem from the tremendous antigenic diversity displayed by parasites circulating in the field. For Plasmodium falciparum, the most virulent of the human malaria parasites, this diversity is exemplified by the variant gene family called var, which encodes the major surface antigen displayed on infected red blood cells (RBCs). This gene family exhibits virtually limitless diversity when var gene repertoires from different parasite isolates are compared. Previous studies indicated that this remarkable genome plasticity results from extensive ectopic recombination between var genes during mitotic replication; however, the molecular mechanisms that direct this process to antigen-encoding loci while the rest of the genome remains relatively stable were not determined. Using targeted DNA double-strand breaks (DSBs) and long-read whole-genome sequencing, we show that a single break within an antigen-encoding region of the genome can result in a cascade of recombination events leading to the generation of multiple chimeric var genes, a process that can greatly accelerate the generation of diversity within this family. We also found that recombinations did not occur randomly, but rather high-probability, specific recombination products were observed repeatedly. These results provide a molecular basis for previously described structured rearrangements that drive diversification of this highly polymorphic gene family.
One Aeromonas salmonicida subsp. salmonicida isolate with a pAsa5 variant bearing antibiotic resistance and a pRAS3 variant making a link with a swine pathogen.
The Gram-negative bacterium Aeromonas salmonicida subsp. salmonicida is an aquatic pathogen which causes furunculosis to salmonids, especially in fish farms. The emergence of strains of this bacterium exhibiting antibiotic resistance is increasing, limiting the effectiveness of antibiotherapy as a treatment against this worldwide disease. In the present study, we discovered an isolate of A. salmonicida subsp. salmonicida that harbors two novel plasmids variants carrying antibiotic resistance genes. The use of long-read sequencing (PacBio) allowed us to fully characterize those variants, named pAsa5-3432 and pRAS3-3432, which both differ from their classic counterpart through their content in mobile genetic elements. The plasmid pAsa5-3432 carries a new multidrug region composed of multiple mobile genetic elements, including a Class 1 integron similar to an integrated element of Salmonella enterica. With this new region, probably acquired through plasmid recombination, pAsa5-3432 is the first reported plasmid of this bacterium that bears both an essential virulence factor (the type three secretion system) and multiple antibiotic resistance genes. As for pRAS3-3432, compared to the classic pRAS3, it carries a new mobile element that has only been identified in Chlamydia suis. Hence, with the identification of those two novel plasmids harboring mobile genetic elements that are normally encountered in other bacterial species, the present study puts emphasis on the important impact of mobile genetic elements in the genomic plasticity of A. salmonicida subsp. salmonicida and suggests that this aquatic bacterium could be an important reservoir of antibiotic resistance genes that can be exchanged with other bacteria, including human and animal pathogens. Copyright © 2019 Elsevier B.V. All rights reserved.
Sucrose accumulation and decreased photosynthesis are early symptoms of yellow canopy syndrome (YCS) in sugarcane (Saccharum spp.), and precede the visual yellowing of the leaves. To investigate broad-scale gene expression changes during YCS-onset, transcriptome analyses coupled to metabolome analyses were performed. Across leaf tissues, the greatest number of differentially expressed genes related to the chloroplast, and the metabolic processes relating to nitrogen and carbohydrates. Five genes represented 90% of the TPM (Transcripts Per Million) associated with the downregulation of transcription during YCS-onset, which included PSII D1 (PsbA). This differential expression was consistent with a feedback regulatory effect upon photosynthesis. Broad-scale gene expression analyses did not reveal a cause for leaf sugar accumulation during YCS-onset. Interestingly, the midrib showed the greatest accumulation of sugars, followed by symptomatic lamina. To investigate if phloem loading/reloading may be compromised on a gene expression level – to lead to leaf sucrose accumulation – sucrose transport-related proteins of SWEETs, Sucrose Transporters (SUTs), H+-ATPases and H+-pyrophosphatases (H+-PPases) were characterised from a sugarcane transcriptome and expression analysed. Two clusters of Type I H+-PPases, with one upregulated and the other downregulated, were evident. Although less pronounced, a similar pattern of change was observed for the H+-ATPases. The disaccharide transporting SWEETs were downregulated after visual symptoms were present, and a monosaccharide transporting SWEET upregulated preceding, as well as after, symptom development. SUT gene expression was the least responsive to YCS development. The results are consistent with a reduction of photoassimilate movement through the phloem leading to sucrose build-up in the leaf.
Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes.Copyright © 2019 The Author(s). Published by Elsevier Inc. All rights reserved.
Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase-encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis
Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.
The development of clustered regularly interspaced short-palindromic repeat (CRISPR)-Cas systems for genome editing has transformed the way life science research is conducted and holds enormous potential for the treatment of disease as well as for many aspects of biotech- nology. Here, I provide a personal perspective on the development of CRISPR-Cas9 for genome editing within the broader context of the field and discuss our work to discover novel Cas effectors and develop them into additional molecular tools. The initial demonstra- tion of Cas9-mediated genome editing launched the development of many other technologies, enabled new lines of biological inquiry, and motivated a deeper examination of natural CRISPR-Cas systems, including the discovery of new types of CRISPR-Cas systems. These new discoveries in turn spurred further technological developments. I review these exciting discoveries and technologies as well as provide an overview of the broad array of applications of these technologies in basic research and in the improvement of human health. It is clear that we are only just beginning to unravel the potential within microbial diversity, and it is quite likely that we will continue to discover other exciting phenomena, some of which it may be possible to repurpose as molecular technologies. The transformation of mysterious natural phenomena to powerful tools, however, takes a collective effort to discover, characterize, and engineer them, and it has been a privilege to join the numerous researchers who have contributed to this transformation of CRISPR-Cas systems.
Alternative splicing performs a central role in expanding genomic coding capacity and proteomic diversity. However, programming of splicing patterns in engineered biological systems remains underused. Synthetic approaches thus far have predominantly focused on controlling expression of a single protein through alternative splicing. Here, we describe a modular and extensible platform for regulating four programmable exons that undergo a mutually exclusive alternative splicing event to generate multiple functionally-distinct proteins. We present an intron framework that enforces the mutual exclusivity of two internal exons and demonstrate a graded series of consensus sequence elements of varying strengths that set the ratio of two mutually exclusive isoforms. We apply this framework to program the DNA-binding domains of modular transcription factors to differentially control downstream gene activation. This splicing platform advances an approach for generating diverse isoforms and can ultimately be applied to program modular proteins and increase coding capacity of synthetic biological systems.
Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis).
Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun.We generated ˜300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1?Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle.Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.
The transcriptome of Darwin’s bark spider silk glands predicts proteins contributing to dragline silk toughness.
Darwin’s bark spider (Caerostris darwini) produces giant orb webs from dragline silk that can be twice as tough as other silks, making it the toughest biological material. This extreme toughness comes from increased extensibility relative to other draglines. We show C. darwini dragline-producing major ampullate (MA) glands highly express a novel silk gene transcript (MaSp4) encoding a protein that diverges markedly from closely related proteins and contains abundant proline, known to confer silk extensibility, in a unique GPGPQ amino acid motif. This suggests C. darwini evolved distinct proteins that may have increased its dragline’s toughness, enabling giant webs. Caerostris darwini’s MA spinning ducts also appear unusually long, potentially facilitating alignment of silk proteins into extremely tough fibers. Thus, a suite of novel traits from the level of genes to spinning physiology to silk biomechanics are associated with the unique ecology of Darwin’s bark spider, presenting innovative designs for engineering biomaterials.
Characterization of a male specific region containing a candidate sex determining gene in Atlantic cod.
The genetic mechanisms determining sex in teleost fishes are highly variable and the master sex determining gene has only been identified in few species. Here we characterize a male-specific region of 9?kb on linkage group 11 in Atlantic cod (Gadus morhua) harboring a single gene named zkY for zinc knuckle on the Y chromosome. Diagnostic PCR test of phenotypically sexed males and females confirm the sex-specific nature of the Y-sequence. We identified twelve highly similar autosomal gene copies of zkY, of which eight code for proteins containing the zinc knuckle motif. 3D modeling suggests that the amino acid changes observed in six copies might influence the putative RNA-binding specificity. Cod zkY and the autosomal proteins zk1 and zk2 possess an identical zinc knuckle structure, but only the Y-specific gene zkY was expressed at high levels in the developing larvae before the onset of sex differentiation. Collectively these data suggest zkY as a candidate master masculinization gene in Atlantic cod. PCR amplification of Y-sequences in Arctic cod (Arctogadus glacialis) and Greenland cod (Gadus macrocephalus ogac) suggests that the male-specific region emerged in codfishes more than 7.5 million years ago.
Resource Concentration Modulates the Fate of Dissimilated Nitrogen in a Dual-Pathway Actinobacterium.
Respiratory ammonification and denitrification are two evolutionarily unrelated dissimilatory nitrogen (N) processes central to the global N cycle, the activity of which is thought to be controlled by carbon (C) to nitrate (NO3-) ratio. Here we find that Intrasporangium calvum C5, a novel dual-pathway denitrifier/respiratory ammonifier, disproportionately utilizes ammonification rather than denitrification when grown under low C concentrations, even at low C:NO3- ratios. This finding is in conflict with the paradigm that high C:NO3- ratios promote ammonification and low C:NO3- ratios promote denitrification. We find that the protein atomic composition for denitrification modules (NirK) are significantly cost minimized for C and N compared to ammonification modules (NrfA), indicating that limitation for C and N is a major evolutionary selective pressure imprinted in the architecture of these proteins. The evolutionary precedent for these findings suggests ecological importance for microbial activity as evidenced by higher growth rates when I. calvum grows predominantly using its ammonification pathway and by assimilating its end-product (ammonium) for growth under ammonium-free conditions. Genomic analysis of I. calvum further reveals a versatile ecophysiology to cope with nutrient stress and redox conditions. Metabolite and transcriptional profiles during growth indicate that enzyme modules, NrfAH and NirK, are not constitutively expressed but rather induced by nitrite production via NarG. Mechanistically, our results suggest that pathway selection is driven by intracellular redox potential (redox poise), which may be lowered when resource concentrations are low, thereby decreasing catalytic activity of upstream electron transport steps (i.e., the bc1 complex) needed for denitrification enzymes. Our work advances our understanding of the biogeochemical flexibility of N-cycling organisms, pathway evolution, and ecological food-webs.