Menu
September 22, 2019  |  

Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken.

The domestic chicken (Gallus gallus) is widely used as a model in developmental biology and is also an important livestock species. We describe a novel approach to data integration to generate an mRNA expression atlas for the chicken spanning major tissue types and developmental stages, using a diverse range of publicly-archived RNA-seq datasets and new data derived from immune cells and tissues.Randomly down-sampling RNA-seq datasets to a common depth and quantifying expression against a reference transcriptome using the mRNA quantitation tool Kallisto ensured that disparate datasets explored comparable transcriptomic space. The network analysis tool Graphia was used to extract clusters of co-expressed genes from the resulting expression atlas, many of which were tissue or cell-type restricted, contained transcription factors that have previously been implicated in their regulation, or were otherwise associated with biological processes, such as the cell cycle. The atlas provides a resource for the functional annotation of genes that currently have only a locus ID. We cross-referenced the RNA-seq atlas to a publicly available embryonic Cap Analysis of Gene Expression (CAGE) dataset to infer the developmental time course of organ systems, and to identify a signature of the expansion of tissue macrophage populations during development.Expression profiles obtained from public RNA-seq datasets – despite being generated by different laboratories using different methodologies – can be made comparable to each other. This meta-analytic approach to RNA-seq can be extended with new datasets from novel tissues, and is applicable to any species.


September 22, 2019  |  

Genomic microdiversity of Bifidobacterium pseudocatenulatum underlying differential strain-level responses to dietary carbohydrate intervention.

The genomic basis of the response to dietary intervention of human gut beneficial bacteria remains elusive, which hinders precise manipulation of the microbiota for human health. After receiving a dietary intervention enriched with nondigestible carbohydrates for 105 days, a genetically obese child with Prader-Willi syndrome lost 18.4% of his body weight and showed significant improvement in his bioclinical parameters. We obtained five isolates (C1, C15, C55, C62, and C95) of one of the most abundantly promoted beneficial species, Bifidobacterium pseudocatenulatum, from a postintervention fecal sample. Intriguingly, these five B. pseudocatenulatum strains showed differential responses during the dietary intervention. Two strains were largely unaffected, while the other three were promoted to different extents by the changes in dietary carbohydrate resources. The differential responses of these strains were consistent with their functional clustering based on the COGs (Clusters of Orthologous Groups), including those involved with the ABC-type sugar transport systems, suggesting that the strain-specific genomic variations may have contributed to the niche adaption. Particularly, B. pseudocatenulatum C15, which had the most diverse types and highest gene copy numbers of carbohydrate-active enzymes targeting plant polysaccharides, had the highest abundance after the dietary intervention. These studies show the importance of understanding genomic diversity of specific members of the gut microbiota if precise nutrition approaches are to be realized.IMPORTANCE The manipulation of the gut microbiota via dietary approaches is a promising option for improving human health. Our findings showed differential responses of multiple B. pseudocatenulatum strains isolated from the same habitat to the dietary intervention, as well as strain-specific correlations with bioclinical parameters of the host. The comparative genomics revealed a genome-level microdiversity of related functional genes, which may have contributed to these differences. These results highlight the necessity of understanding strain-level differences if precise manipulation of gut microbiota through dietary approaches is to be realized. Copyright © 2017 Wu et al.


September 22, 2019  |  

Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza.

The genus Oryza is a model system for the study of molecular evolution over time scales ranging from a few thousand to 15 million years. Using 13 reference genomes spanning the Oryza species tree, we show that despite few large-scale chromosomal rearrangements rapid species diversification is mirrored by lineage-specific emergence and turnover of many novel elements, including transposons, and potential new coding and noncoding genes. Our study resolves controversial areas of the Oryza phylogeny, showing a complex history of introgression among different chromosomes in the young ‘AA’ subclade containing the two domesticated species. This study highlights the prevalence of functionally coupled disease resistance genes and identifies many new haplotypes of potential use for future crop protection. Finally, this study marks a milestone in modern rice research with the release of a complete long-read assembly of IR 8 ‘Miracle Rice’, which relieved famine and drove the Green Revolution in Asia 50 years ago.


September 22, 2019  |  

Simulating the dynamics of targeted capture sequencing with CapSim.

Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridization process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly.We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridization and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimize the experiments.CapSim is publicly available under BSD license at https://github.com/Devika1/capsim.l.coin@imb.uq.edu.au.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


September 22, 2019  |  

Effect of plasmid design and type of integration event on recombinant protein expression in Pichia pastoris.

Pichia pastoris (syn. Komagataella phaffii) is one of the most common eukaryotic expression systems for heterologous protein production. Expression cassettes are typically integrated in the genome to obtain stable expression strains. In contrast to Saccharomyces cerevisiae, where short overhangs are sufficient to target highly specific integration, long overhangs are more efficient in P. pastoris and ectopic integration of foreign DNA can occur. Here, we aimed to elucidate the influence of ectopic integration by high-throughput screening of >700 transformants and whole-genome sequencing of 27 transformants. Different vector designs and linearization approaches were used to mimic the most common integration events targeted in P. pastoris Fluorescence of an enhanced green fluorescent protein (eGFP) reporter protein was highly uniform among transformants when the expression cassettes were correctly integrated in the targeted locus. Surprisingly, most nonspecifically integrated transformants showed highly uniform expression that was comparable to specific integration, suggesting that nonspecific integration does not necessarily influence expression. However, a few clones (<10%) harboring ectopically integrated cassettes showed a greater variation spanning a 25-fold range, surpassing specifically integrated reference strains up to 6-fold. High-expression strains showed a correlation between increased gene copy numbers and high reporter protein fluorescence levels. Our results suggest that for comparing expression levels between strains, the integration locus can be neglected as long as a sufficient numbers of transformed strains are compared. For expression optimization of highly expressible proteins, increasing copy number appears to be the dominant positive influence rather than the integration locus, genomic rearrangements, deletions, or single-nucleotide polymorphisms (SNPs).IMPORTANCE Yeasts are commonly used as biotechnological production hosts for proteins and metabolites. In the yeast Saccharomyces cerevisiae, expression cassettes carrying foreign genes integrate highly specifically at the targeted sites in the genome. In contrast, cassettes often integrate at random genomic positions in nonconventional yeasts, such as Pichia pastoris (syn. Komagataella phaffii). Hence, cells from the same transformation event often behave differently, with significant clonal variation necessitating the screening of large numbers of strains. The importance of this study is that we systematically investigated the influence of integration events in more than 700 strains. Our findings provide novel insight into clonal variation in P. pastoris and, thus, how to avoid pitfalls and obtain reliable results. The underlying mechanisms may also play a role in other yeasts and hence could be generally relevant for recombinant yeast protein production strains. Copyright © 2018 American Society for Microbiology.


September 22, 2019  |  

A mosaic monoploid reference sequence for the highly complex genome of sugarcane.

Sugarcane (Saccharum spp.) is a major crop for sugar and bioenergy production. Its highly polyploid, aneuploid, heterozygous, and interspecific genome poses major challenges for producing a reference sequence. We exploited colinearity with sorghum to produce a BAC-based monoploid genome sequence of sugarcane. A minimum tiling path of 4660 sugarcane BAC that best covers the gene-rich part of the sorghum genome was selected based on whole-genome profiling, sequenced, and assembled in a 382-Mb single tiling path of a high-quality sequence. A total of 25,316 protein-coding gene models are predicted, 17% of which display no colinearity with their sorghum orthologs. We show that the two species, S. officinarum and S. spontaneum, involved in modern cultivars differ by their transposable elements and by a few large chromosomal rearrangements, explaining their distinct genome size and distinct basic chromosome numbers while also suggesting that polyploidization arose in both lineages after their divergence.


September 22, 2019  |  

npInv: accurate detection and genotyping of inversions using long read sub-alignment.

Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.


September 22, 2019  |  

A reference genome of the Chinese hamster based on a hybrid assembly strategy.

Accurate and complete genome sequences are essential in biotechnology to facilitate genome-based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short-read-based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina-based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28-fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up- and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research.© 2018 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals, Inc.


September 22, 2019  |  

Analysis of the draft genome of the red seaweed Gracilariopsis chorda provides insights into genome size evolution in Rhodophyta.

Red algae (Rhodophyta) underwent two phases of large-scale genome reduction during their early evolution. The red seaweeds did not attain genome sizes or gene inventories typical of other multicellular eukaryotes. We generated a high-quality 92.1 Mb draft genome assembly from the red seaweed Gracilariopsis chorda, including methylation and small (s)RNA data. We analyzed these and other Archaeplastida genomes to address three questions: 1) What is the role of repeats and transposable elements (TEs) in explaining Rhodophyta genome size variation, 2) what is the history of genome duplication and gene family expansion/reduction in these taxa, and 3) is there evidence for TE suppression in red algae? We find that the number of predicted genes in red algae is relatively small (4,803-13,125 genes), particularly when compared with land plants, with no evidence of polyploidization. Genome size variation is primarily explained by TE expansion with the red seaweeds having the largest genomes. Long terminal repeat elements and DNA repeats are the major contributors to genome size growth. About 8.3% of the G. chorda genome undergoes cytosine methylation among gene bodies, promoters, and TEs, and 71.5% of TEs contain methylated-DNA with 57% of these regions associated with sRNAs. These latter results suggest a role for TE-associated sRNAs in RNA-dependent DNA methylation to facilitate silencing. We postulate that the evolution of genome size in red algae is the result of the combined action of TE spread and the concomitant emergence of its epigenetic suppression, together with other important factors such as changes in population size.


September 22, 2019  |  

Linking genotype and phenotype in an economically viable propionic acid biosynthesis process

Propionic acid (PA) is used as a food preservative and increasingly, as a precursor for the synthesis of monomers. PA is produced mainly through hydrocarboxylation of ethylene, also known as the `oxo-process’; however, Propionibacterium species are promising biological PA producers natively producing PA as their main fermentation product. However, for fermentation to be competitive, a PA yield of at least 0.6 g/g is required.


September 22, 2019  |  

Discovery of new genes involved in curli production by a uropathogenic Escherichia coli strain from the highly virulent O45:K1:H7 lineage.

Curli are bacterial surface-associated amyloid fibers that bind to the dye Congo red (CR) and facilitate uropathogenic Escherichia coli (UPEC) biofilm formation and protection against host innate defenses. Here we sequenced the genome of the curli-producing UPEC pyelonephritis strain MS7163 and showed it belongs to the highly virulent O45:K1:H7 neonatal meningitis-associated clone. MS7163 produced curli at human physiological temperature, and this correlated with biofilm growth, resistance of sessile cells to the human cationic peptide cathelicidin, and enhanced colonization of the mouse bladder. We devised a forward genetic screen using CR staining as a proxy for curli production and identified 41 genes that were required for optimal CR binding, of which 19 genes were essential for curli synthesis. Ten of these genes were novel or poorly characterized with respect to curli synthesis and included genes involved in purine de novo biosynthesis, a regulator that controls the Rcs phosphorelay system, and a novel repressor of curli production (referred to as rcpA). The involvement of these genes in curli production was confirmed by the construction of defined mutants and their complementation. The mutants did not express the curli major subunit CsgA and failed to produce curli based on CR binding. Mutation of purF (the first gene in the purine biosynthesis pathway) and rcpA also led to attenuated colonization of the mouse bladder. Overall, this work has provided new insight into the regulation of curli and the role of these amyloid fibers in UPEC biofilm formation and pathogenesis.IMPORTANCE Uropathogenic Escherichia coli (UPEC) strains are the most common cause of urinary tract infection, a disease increasingly associated with escalating antibiotic resistance. UPEC strains possess multiple surface-associated factors that enable their colonization of the urinary tract, including fimbriae, curli, and autotransporters. Curli are extracellular amyloid fibers that enhance UPEC virulence and promote biofilm formation. Here we examined the function and regulation of curli in a UPEC pyelonephritis strain belonging to the highly virulent O45:K1:H7 neonatal meningitis-associated clone. Curli expression at human physiological temperature led to increased biofilm formation, resistance of sessile cells to the human cationic peptide LL-37, and enhanced bladder colonization. Using a comprehensive genetic screen, we identified multiple genes involved in curli production, including several that were novel or poorly characterized with respect to curli synthesis. In total, this study demonstrates an important role for curli as a UPEC virulence factor that promotes biofilm formation, resistance, and pathogenesis. Copyright © 2018 Nhu et al.


September 22, 2019  |  

Variant O89 O-antigen of E. coli is associated with group 1 capsule loci and multidrug resistance.

Bacterial surface polysaccharides play significant roles in fitness and virulence. In Gram-negative bacteria such as Escherichia coli, major surface polysaccharides are lipopolysaccharide (LPS) and capsule, representing O- and K-antigens, respectively. There are multiple combinations of O:K types, many of which are well-characterized and can be related to ecotype or pathotype. In this investigation, we have identified a novel O:K permutation resulting through a process of major genome reorganization in a clade of E. coli. A multidrug-resistant, extended-spectrum ß-lactamase (ESBL)-producing strain – E. coli 26561 – represented a prototype of strains combining a locus variant of O89 and group 1 capsular polysaccharide. Specifically, the variant O89 locus in this strain was truncated at gnd, flanked by insertion sequences and located between nfsB and ybdK and we apply the term O89m for this variant. The prototype lacked colanic acid and O-antigen loci between yegH and hisI with this tandem polysaccharide locus being replaced with a group 1 capsule (G1C) which, rather than being a recognized E. coli capsule type, this locus matched to Klebsiella K10 capsule type. A genomic survey identified more than 200 E. coli strains which possessed the O89m locus variant with one of a variety of G1C types. Isolates from our collection with the combination of O89m and G1C all displayed a mucoid phenotype and E. coli 26561 was unusual in exhibiting a mucoviscous phenotype more recognized as a characteristic among Klebsiella strains. Despite the locus truncation and novel location, all O89m:G1C strains examined showed a ladder pattern typifying smooth LPS and also showed high molecular weight, alcian blue-staining polysaccharide in cellular and/or extra-cellular fractions. Expression of both O-antigen and capsule biosynthesis loci were confirmed in prototype strain 26561 through quantitative proteome analysis. Further in silico exploration of more than 200 E. coli strains possessing the O89m:G1C combination identified a very high prevalence of multidrug resistance (MDR) – 85% possessed resistance to three or more antibiotic classes and a high proportion (58%) of these carried ESBL and/or carbapenemase. The increasing isolation of O89m:G1C isolates from extra-intestinal infection sites suggests that these represents an emergent clade of invasive, MDR E. coli.


September 22, 2019  |  

Discovery of mcr-1-mediated colistin resistance in a highly virulent Escherichia coli lineage.

Resistance to last-line polymyxins mediated by the plasmid-borne mobile colistin resistance gene (mcr-1) represents a new threat to global human health. Here we present the complete genome sequence of an mcr-1-positive multidrug-resistant Escherichia coli strain (MS8345). We show that MS8345 belongs to serotype O2:K1:H4, has a large 241,164-bp IncHI2 plasmid that carries 15 other antibiotic resistance genes (including the extended-spectrum ß-lactamase blaCTX-M-1) and 3 putative multidrug efflux systems, and contains 14 chromosomally encoded antibiotic resistance genes. MS8345 also carries a large ColV-like virulence plasmid that has been associated with E. coli bacteremia. Whole-genome phylogeny revealed that MS8345 clusters within a discrete clade in the sequence type 95 (ST95) lineage, and MS8345 is very closely related to the highly virulent O45:K1:H4 clone associated with neonatal meningitis. Overall, the acquisition of a plasmid carrying resistance to colistin and multiple other antibiotics in this virulent E. coli lineage is concerning and might herald an era where the empirical treatment of ST95 infections becomes increasingly more difficult.IMPORTANCEEscherichia coli ST95 is a globally disseminated clone frequently associated with bloodstream infections and neonatal meningitis. However, the ST95 lineage is defined by low levels of drug resistance amongst clinical isolates, which normally provides for uncomplicated treatment options. Here, we provide the first detailed genomic analysis of an E. coli ST95 isolate that has both high virulence potential and resistance to multiple antibiotics. Using the genome, we predicted its virulence and antibiotic resistance mechanisms, which include resistance to last-line antibiotics mediated by the plasmid-borne mcr-1 gene. Finding an ST95 isolate resistant to nearly all antibiotics that also has a high virulence potential is of major clinical importance and underscores the need to monitor new and emerging trends in antibiotic resistance development in this important global lineage. Copyright © 2018 Forde et al.


September 22, 2019  |  

Streptococcus suis contains multiple phase-variable methyltransferases that show a discrete lineage distribution.

Streptococcus suis is a major pathogen of swine, responsible for a number of chronic and acute infections, and is also emerging as a major zoonotic pathogen, particularly in South-East Asia. Our study of a diverse population of S. suis shows that this organism contains both Type I and Type III phase-variable methyltransferases. In all previous examples, phase-variation of methyltransferases results in genome wide methylation differences, and results in differential regulation of multiple genes, a system known as the phasevarion (phase-variable regulon). We hypothesized that each variant in the Type I and Type III systems encoded a methyltransferase with a unique specificity, and could therefore control a distinct phasevarion, either by recombination-driven shuffling between different specificities (Type I) or by biphasic on-off switching via simple sequence repeats (Type III). Here, we present the identification of the target specificities for each Type III allelic variant from S. suis using single-molecule, real-time methylome analysis. We demonstrate phase-variation is occurring in both Type I and Type III methyltransferases, and show a distinct association between methyltransferase type and presence, and population clades. In addition, we show that the phase-variable Type I methyltransferase was likely acquired at the origin of a highly virulent zoonotic sub-population.


September 22, 2019  |  

Genome sequences of two diploid wild relatives of cultivated sweetpotato reveal targets for genetic improvement

Sweetpotato [Ipomoea batatas (L.) Lam.] is a globally important staple food crop, especially for sub-Saharan Africa. Agronomic improvement of sweetpotato has lagged behind other major food crops due to a lack of genomic and genetic resources and inherent challenges in breeding a heterozygous, clonally propagated polyploid. Here, we report the genome sequences of its two diploid relatives, I. trifida and I. triloba, and show that these high-quality genome assemblies are robust references for hexaploid sweetpotato. Comparative and phylogenetic analyses reveal insights into the ancient whole-genome triplication history of Ipomoea and evolutionary relationships within the Batatas complex. Using resequencing data from 16 genotypes widely used in African breeding programs, genes and alleles associated with carotenoid biosynthesis in storage roots are identified, which may enable efficient breeding of varieties with high provitamin A content. These resources will facilitate genome-enabled breeding in this important food security crop.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.