More than 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests,” which provide unique niches and 3-dimensional living space for other organisms. The octocoral genus Renilla inhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans. Renilla is especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several anthozoan genomes are currently available, the majority of these are hexacorals. Here, we present a de novo assembly of an azooxanthellate shallow-water octocoral, Renilla muelleri.We generated a hybrid de novo assembly using MaSuRCA v.3.2.6. The final assembly included 4,825 scaffolds and a haploid genome size of 172 megabases (Mb). A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustus ab initio gene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone, Nematostella vectensis, or to the Uniprot database. Although the R. muelleri genome may be smaller (172 Mb minimum size) than other publicly available coral genomes (256-448 Mb), the R. muelleri genome is similar to other coral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.The R. muelleri hybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity. © The Author(s) 2019. Published by Oxford University Press.
Clinicogenomics is the exploitation of genome sequence data for diagnostic, therapeutic, and public health purposes. Central to this field is the high-throughput DNA sequencing of genomes and metagenomes. The role of clinicogenomics in infectious disease diagnostics and public health microbiology was the topic of discussion during a recent symposium (session 161) presented at the 115th general meeting of the American Society for Microbiology that was held in New Orleans, LA. What follows is a collection of the most salient and promising aspects from each presentation at the symposium. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.
High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
As researchers open up to the reality of RNA modification, an expanded epitranscriptomics toolbox takes shape.
Gene profiling of diffuse large B cell lymphoma (DLBCL) has revealed broad gene expression deregulation compared to normal B cells. While many studies have interrogated well known and annotated genes in DLBCL, none have yet performed a systematic analysis to uncover novel unannotated long non-coding RNAs (lncRNA) in DLBCL. In this study we sought to uncover these lncRNAs by examining RNA-seq data from primary DLBCL tumors and performed supporting analysis to identify potential role of these lncRNAs in DLBCL.We performed a systematic analysis of novel lncRNAs from the poly-adenylated transcriptome of 116 primary DLBCL samples. RNA-seq data were processed using de novo transcript assembly pipeline to discover novel lncRNAs in DLBCL. Systematic functional, mutational, cross-species, and co-expression analyses using numerous bioinformatics tools and statistical analysis were performed to characterize these novel lncRNAs.We identified 2,632 novel, multi-exonic lncRNAs expressed in more than one tumor, two-thirds of which are not expressed in normal B cells. Long read single molecule sequencing supports the splicing structure of many of these lncRNAs. More than one-third of novel lncRNAs are differentially expressed between the two major DLBCL subtypes, ABC and GCB. Novel lncRNAs are enriched at DLBCL super-enhancers, with a fraction of them conserved between human and dog lymphomas. We see transposable elements (TE) overlap in the exonic regions; particularly significant in the last exon of the novel lncRNAs suggest potential usage of cryptic TE polyadenylation signals. We identified highly co-expressed protein coding genes for at least 88 % of the novel lncRNAs. Functional enrichment analysis of co-expressed genes predicts a potential function for about half of novel lncRNAs. Finally, systematic structural analysis of candidate point mutations (SNVs) suggests that such mutations frequently stabilize lncRNA structures instead of destabilizing them.Discovery of these 2,632 novel lncRNAs in DLBCL significantly expands the lymphoma transcriptome and our analysis identifies potential roles of these lncRNAs in lymphomagenesis and/or tumor maintenance. For further studies, these novel lncRNAs also provide an abundant source of new targets for antisense oligonucleotide pharmacology, including shared targets between human and dog lymphomas.
The African Bullfrog (Pyxicephalus adspersus) genome unites the two ancestral ingredients for making vertebrate sex chromosomes
Heteromorphic sex chromosomes have evolved repeatedly among vertebrate lineages despite largely deleterious reductions in gene dose. Understanding how this gene dose problem is overcome is hampered by the lack of genomic information at the base of tetrapods and comparisons across the evolutionary history of vertebrates. To address this problem, we produced a chromosome-level genome assembly for the African Bullfrog (Pyxicephalus adspersus)–an amphibian with heteromorphic ZW sex chromosomes–and discovered that the Bullfrog Z is surprisingly homologous to substantial portions of the human X. Using this new reference genome, we identified ancestral synteny among the sex chromosomes of major vertebrate lineages, showing that non-mammalian sex chromosomes are strongly associated with a single vertebrate ancestral chromosome, while mammals are associated with another that displays increased haploinsufficiency. The sex chromosomes of the African Bullfrog however, share genomic blocks with both humans and non-mammalian vertebrates, connecting the two ancestral chromosome sequences that repeatedly characterize vertebrate sex chromosomes. Our results highlight the consistency of sex-linked sequences despite sex determination system lability and reveal the repeated use of two major genomic sequence blocks during vertebrate sex chromosome evolution.
Prevailing dogma holds that ribosomes are uniform in composition and function. Here, we show that nutrient limitation-induced stress in E. coli changes the relative expression of rDNA operons to alter the rRNA composition within the actively translating ribosome pool. The most upregulated operon encodes the unique 16S rRNA, rrsH, distinguished by conserved sequence variation within the small ribosomal subunit. rrsH-bearing ribosomes affect the expression of functionally coherent gene sets and alter the levels of the RpoS sigma factor, the master regulator of the general stress response. These impacts are associated with phenotypic changes in antibiotic sensitivity, biofilm formation, and cell motility and are regulated by stress response proteins, RelA and RelE, as well as the metabolic enzyme and virulence-associated protein, AdhE. These findings establish that endogenously encoded, naturally occurring rRNA sequence variation can modulate ribosome function, central aspects of gene expression regulation, and cellular physiology. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
We describe the landscape of somatic genomic alterations of 66 chromophobe renal cell carcinomas (ChRCCs) on the basis of multidimensional and comprehensive characterization, including mtDNA and whole-genome sequencing. The result is consistent that ChRCC originates from the distal nephron compared with other kidney cancers with more proximal origins. Combined mtDNA and gene expression analysis implicates changes in mitochondrial function as a component of the disease biology, while suggesting alternative roles for mtDNA mutations in cancers relying on oxidative phosphorylation. Genomic rearrangements lead to recurrent structural breakpoints within TERT promoter region, which correlates with highly elevated TERT expression and manifestation of kataegis, representing a mechanism of TERT upregulation in cancer distinct from previously observed amplifications and point mutations. Copyright © 2014 Elsevier Inc. All rights reserved.
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Insights into the preservation of the homomorphic sex-determining chromosome of Aedes aegypti from the discovery of a male-biased gene tightly linked to the M-locus.
The preservation of a homomorphic sex-determining chromosome in some organisms without transformation into a heteromorphic sex chromosome is a long-standing enigma in evolutionary biology. A dominant sex-determining locus (or M-locus) in an undifferentiated homomorphic chromosome confers the male phenotype in the yellow fever mosquito Aedes aegypti. Genetic evidence suggests that the M-locus is in a nonrecombining region. However, the molecular nature of the M-locus has not been characterized. Using a recently developed approach based on Illumina sequencing of male and female genomic DNA, we identified a novel gene, myo-sex, that is present almost exclusively in the male genome but can sporadically be found in the female genome due to recombination. For simplicity, we define sequences that are primarily found in the male genome as male-biased. Fluorescence in situ hybridization (FISH) on A. aegypti chromosomes demonstrated that the myo-sex probe localized to region 1q21, the established location of the M-locus. Myo-sex is a duplicated myosin heavy chain gene that is highly expressed in the pupa and adult male. Myo-sex shares 83% nucleotide identity and 97% amino acid identity with its closest autosomal paralog, consistent with ancient duplication followed by strong purifying selection. Compared with males, myo-sex is expressed at very low levels in the females that acquired it, indicating that myo-sex may be sexually antagonistic. This study establishes a framework to discover male-biased sequences within a homomorphic sex-determining chromosome and offers new insights into the evolutionary forces that have impeded the expansion of the nonrecombining M-locus in A. aegypti.
Recent studies have found methyl-6-adenosine in thousands of mammalian genes, and this modification is most pronounced near the beginning of the 3′ UTR. We present a perspective on current work and new single-molecule sequencing methods for detecting RNA base modifications.
Population genomic analysis of 1,777 extended-spectrum beta-lactamase-producing Klebsiella pneumoniae isolates, Houston, Texas: unexpected abundance of clonal group 307.
Klebsiella pneumoniae is a major human pathogen responsible for high morbidity and mortality rates. The emergence and spread of strains resistant to multiple antimicrobial agents and documented large nosocomial outbreaks are especially concerning. To develop new therapeutic strategies for K. pneumoniae, it is imperative to understand the population genomic structure of strains causing human infections. To address this knowledge gap, we sequenced the genomes of 1,777 extended-spectrum beta-lactamase-producing K. pneumoniae strains cultured from patients in the 2,000-bed Houston Methodist Hospital system between September 2011 and May 2015, representing a comprehensive, population-based strain sample. Strains of largely uncharacterized clonal group 307 (CG307) caused more infections than those of well-studied epidemic CG258. Strains varied markedly in gene content and had an extensive array of small and very large plasmids, often containing antimicrobial resistance genes. Some patients with multiple strains cultured over time were infected with genetically distinct clones. We identified 15 strains expressing the New Delhi metallo-beta-lactamase 1 (NDM-1) enzyme that confers broad resistance to nearly all beta-lactam antibiotics. Transcriptome sequencing analysis of 10 phylogenetically diverse strains showed that the global transcriptome of each strain was unique and highly variable. Experimental mouse infection provided new information about immunological parameters of host-pathogen interaction. We exploited the large data set to develop whole-genome sequence-based classifiers that accurately predict clinical antimicrobial resistance for 12 of the 16 antibiotics tested. We conclude that analysis of large, comprehensive, population-based strain samples can assist understanding of the molecular diversity of these organisms and contribute to enhanced translational research. IMPORTANCEKlebsiella pneumoniae causes human infections that are increasingly difficult to treat because many strains are resistant to multiple antibiotics. Clonal group 258 (CG258) organisms have caused outbreaks in health care settings worldwide. Using a comprehensive population-based sample of extended-spectrum beta-lactamase (ESBL)-producing K. pneumoniae strains, we show that a relatively uncommon clonal type, CG307, caused the plurality of ESBL-producing K. pneumoniae infections in our patients. We discovered that CG307 strains have been abundant in Houston for many years. As assessed by experimental mouse infection, CG307 strains were as virulent as pandemic CG258 strains. Our results may portend the emergence of an especially successful clonal group of antibiotic-resistant K. pneumoniae. Copyright © 2017 Long et al.
Instances of recent and rapid speciation are suitable for associating phenotypes with their causal genotypes, especially if gene flow homogenizes areas of the genome that are not under divergent selection. We study a rapid radiation of nine sympatric bird species known as capuchino seedeaters, which are differentiated in sexually selected characters of male plumage and song. We sequenced the genomes of a phenotypically diverse set of species to search for differentiated genomic regions. Capuchinos show differences in a small proportion of their genomes, yet selection has acted independently on the same targets in different members of this radiation. Many divergent regions contain genes involved in the melanogenesis pathway, with the strongest signal originating from putative regulatory regions. Selection has acted on these same genomic regions in different lineages, likely shaping the evolution of cis-regulatory elements, which control how more conserved genes are expressed and thereby generate diversity in classically sexually selected traits.
Genome sequence and analysis of Escherichia coli MRE600, a colicinogenic, nonmotile strain that lacks RNase I and the type I methyltransferase, EcoKI.
Escherichia coli strain MRE600 was originally identified for its low RNase I activity and has therefore been widely adopted by the biomedical research community as a preferred source for the expression and purification of transfer RNAs and ribosomes. Despite its widespread use, surprisingly little information about its genome or genetic content exists. Here, we present the first de novo assembly and description of the MRE600 genome and epigenome. To provide context to these studies of MRE600, we include comparative analyses with E. coli K-12 MG1655 (K12). Pacific Biosciences Single Molecule, Real-Time sequencing reads were assembled into one large chromosome (4.83 Mb) and three smaller plasmids (89.1, 56.9, and 7.1 kb). Interestingly, the 7.1-kb plasmid possesses genes encoding a colicin E1 protein and its associated immunity protein. The MRE600 genome has a G + C content of 50.8% and contains a total of 5,181 genes, including 4,913 protein-encoding genes and 268 RNA genes. We identified 41,469 modified DNA bases (0.83% of total) and found that MRE600 lacks the gene for type I methyltransferase, EcoKI. Phylogenetic, taxonomic, and genetic analyses demonstrate that MRE600 is a divergent E. coli strain that displays features of the closely related genus, Shigella. Nevertheless, comparative analyses between MRE600 and E. coli K12 show that these two strains exhibit nearly identical ribosomal proteins, ribosomal RNAs, and highly homologous tRNA species. Substantiating prior suggestions that MRE600 lacks RNase I activity, the RNase I-encoding gene, rna, contains a single premature stop codon early in its open-reading frame. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.