Case Study: Pioneering a pan-genome reference collection

At DuPont Pioneer, DNA sequencing is paramount for R&D to reveal the genetic basis for traits of interest in commercial crops such as maize, soybean, sorghum, sunflower, alfalfa, canola, wheat, rice, and others. They cannot afford to wait the years it has historically taken for high-quality reference genomes to be produced. Nor can they rely on a single reference to represent the genetic diversity in its germplasm.

PAG Conference: Dawn of the crop pangenome era

To make improvements to crops like corn, soybeans, and canola, scientists at Corteva are building a compendium of crop genomics resources to provide actionable sequence info for genetic discovery, gene-editing,…

Pangenome analysis of Bifidobacterium longum and site-directed mutagenesis through by-pass of restriction-modification systems.

Bifidobacterial genome analysis has provided insights as to how these gut commensals adapt to and persist in the human GIT, while also revealing genetic diversity among members of a given bifidobacterial (sub)species. Bifidobacteria are notoriously recalcitrant to genetic modification, which prevents exploration of their genomic functions, including those that convey (human) health benefits.PacBio SMRT sequencing was used to determine the whole genome seqeunces of two B. longum subsp. longum strains. The B. longum pan-genome was computed using PGAP v1.2 and the core B. longum phylogenetic tree was constructed using a maximum-likelihood based approach in PhyML v3.0. M.blmNCII was cloned in E. coli and an internal fragment if arfBarfB was cloned into pORI19 for insertion mutagenesis.In this study we present the complete genome sequences of two Bifidobacterium longum subsp. longum strains. Comparative analysis with thirty one publicly available B. longum genomes allowed the definition of the B. longum core and dispensable genomes. This analysis also highlighted differences in particular metabolic abilities between members of the B. longum subspecies infantis, longum and suis. Furthermore, phylogenetic analysis of the B. longum core genome indicated the existence of a novel subspecies. Methylome data, coupled to the analysis of restriction-modification systems, allowed us to substantially increase the genetic accessibility of B. longum subsp. longum NCIMB 8809 to a level that was shown to permit site-directed mutagenesis.Comparative genomic analysis of thirty three B. longum representatives revealed a closed pan-genome for this bifidobacterial species. Phylogenetic analysis of the B. longum core genome also provides evidence for a novel fifth B. longum subspecies. Finally, we improved genetic accessibility for the strain B. longum subsp. longum NCIMB 8809, which allowed the generation of a mutant of this strain.

Comprehensive genome analysis of carbapenemase-producing Enterobacter spp.: new insights into phylogeny, population structure and resistance mechanisms.

Knowledge regarding the genomic structure of Enterobacter spp., the second most prevalent carbapenemase-producing Enterobacteriaceae, remains limited. Here we sequenced 97 clinical Enterobacter species isolates that were both carbapenem susceptible and resistant from various geographic regions to decipher the molecular origins of carbapenem resistance and to understand the changing phylogeny of these emerging and drug-resistant pathogens. Of the carbapenem-resistant isolates, 30 possessed blaKPC-2, 40 had blaKPC-3, 2 had blaKPC-4, and 2 had blaNDM-1 Twenty-three isolates were carbapenem susceptible. Six genomes were sequenced to completion, and their sizes ranged from 4.6 to 5.1 Mbp. Phylogenomic analysis placed 96 of these genomes, 351 additional Enterobacter genomes downloaded from NCBI GenBank, and six newly sequenced type strains into 19 phylogenomic groups-18 groups (A to R) in the Enterobacter cloacae complex and Enterobacter aerogenes Diverse mechanisms underlying the molecular evolutionary trajectory of these drug-resistant Enterobacter spp. were revealed, including the acquisition of an antibiotic resistance plasmid, followed by clonal spread, horizontal transfer of blaKPC-harboring plasmids between different phylogenomic groups, and repeated transposition of the blaKPC gene among different plasmid backbones. Group A, which comprises multilocus sequence type 171 (ST171), was the most commonly identified (23% of isolates). Genomic analysis showed that ST171 isolates evolved from a common ancestor and formed two different major clusters; each acquiring unique blaKPC-harboring plasmids, followed by clonal expansion. The data presented here represent the first comprehensive study of phylogenomic interrogation and the relationship between antibiotic resistance and plasmid discrimination among carbapenem-resistant Enterobacter spp., demonstrating the genetic diversity and complexity of the molecular mechanisms driving antibiotic resistance in this genus.Enterobacter spp., especially carbapenemase-producing Enterobacter spp., have emerged as a clinically significant cause of nosocomial infections. However, only limited information is available on the distribution of carbapenem resistance across this genus. Augmenting this problem is an erroneous identification of Enterobacter strains because of ambiguous typing methods and imprecise taxonomy. In this study, we used a whole-genome-based comparative phylogenetic approach to (i) revisit and redefine the genus Enterobacter and (ii) unravel the emergence and evolution of the Klebsiella pneumoniae carbapenemase-harboring Enterobacter spp. Using genomic analysis of 447 sequenced strains, we developed an improved understanding of the species designations within this complex genus and identified the diverse mechanisms driving the molecular evolution of carbapenem resistance. The findings in this study provide a solid genomic framework that will serve as an important resource in the future development of molecular diagnostics and in supporting drug discovery programs. Copyright © 2016 Chavda et al.

Comparative and functional genomics of the Lactococcus lactis taxon; insights into evolution and niche adaptation.

Lactococcus lactis is among the most widely studied lactic acid bacterial species due to its long history of safe use and economic importance to the dairy industry, where it is exploited as a starter culture in cheese production.In the current study, we report on the complete sequencing of 16 L. lactis subsp. lactis and L. lactis subsp. cremoris genomes. The chromosomal features of these 16 L. lactis strains in conjunction with 14 completely sequenced, publicly available lactococcal chromosomes were assessed with particular emphasis on discerning the L. lactis subspecies division, evolution and niche adaptation. The deduced pan-genome of L. lactis was found to be closed, indicating that the representative data sets employed for this analysis are sufficient to fully describe the genetic diversity of the taxon.Niche adaptation appears to play a significant role in governing the genetic content of each L. lactis subspecies, while (differential) genome decay and redundancy in the dairy niche is also highlighted.

micropan: an R-package for microbial pan-genomics.

A pan-genome is defined as the set of all unique gene families found in one or more strains of a prokaryotic species. Due to the extensive within-species diversity in the microbial world, the pan-genome is often many times larger than a single genome. Studies of pan-genomes have become popular due to the easy access to whole-genome sequence data for prokaryotes. A pan-genome study reveals species diversity and gene families that may be of special interest, e.g because of their role in bacterial survival or their ability to discriminate strains.We present an R package for the study of prokaryotic pan-genomes. The R computing environment harbors endless possibilities with respect to statistical analyses and graphics. External free software is used for the heavy computations involved, and the R package provides functions for building a computational pipeline.We demonstrate parts of the package on a data set for the gram positive bacterium Enterococcus faecalis. The package is free to download and install from The Comprehensive R Archive Network.

Comparative genomic analyses of the Moraxella catarrhalis serosensitive and seroresistant lineages demonstrate their independent evolution.

The bacterial species Moraxella catarrhalishas been hypothesized as being composed of two distinct lineages (referred to as the seroresistant [SR] and serosensitive [SS]) with separate evolutionary histories based on several molecular typing methods, whereas 16S ribotyping has suggested an additional split within the SS lineage. Previously, we characterized whole-genome sequences of 12 SR-lineage isolates, which revealed a relatively small supragenome when compared with other opportunistic nasopharyngeal pathogens, suggestive of a relatively short evolutionary history. Here, we performed whole-genome sequencing on 18 strains from both ribotypes of the SS lineage, an additional SR strain, as well as four previously identified highly divergent strains based on multilocus sequence typing analyses. All 35 strains were subjected to a battery of comparative genomic analyses which clearly show that there are three lineages-the SR, SS, and the divergent. The SR and SS lineages are closely related, but distinct from each other based on three different methods of comparison: Allelic differences observed among core genes; possession of lineage-specific sets of core and distributed genes; and by an alignment of concatenated core sequences irrespective of gene annotation. All these methods show that the SS lineage has much longer interstrain branches than the SR lineage indicating that this lineage has likely been evolving either longer or faster than the SR lineage. There is evidence of extensive horizontal gene transfer (HGT) within both of these lineages, and to a lesser degree between them. In particular, we identified very high rates of HGT between these two lineages for ß-lactamase genes. The four divergent strains aresui generis, being much more distantly related to both the SR and SS groups than these other two groups are to each other. Based on average nucleotide identities, gene content, GC content, and genome size, this group could be considered as a separate taxonomic group. The SR and SS lineages, although distinct, clearly form a single species based on multiple criteria including a large common core genome, average nucleotide identity values, GC content, and genome size. Although neither of these lineages arose from within the other based on phylogenetic analyses, the question of how and when these lineages split and then subsequently reunited in the human nasopharynx is explored. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Exploring structural variants in environmentally sensitive gene families.

Environmentally sensitive plant gene families like NBS-LRRs, receptor kinases, defensins and others, are known to be highly variable. However, most existing strategies for discovering and describing structural variation in complex gene families provide incomplete and imperfect results. The move to de novo genome assemblies for multiple accessions or individuals within a species is enabling more comprehensive and accurate insights about gene family variation. Earlier array-based genome hybridization and sequence-based read mapping methods were limited by their reliance on a reference genome and by misplacement of paralogous sequences. Variant discovery based on de novo genome assemblies overcome the problems arising from a reference genome and reduce sequence misplacement. As de novo genome sequencing moves to the use of longer reads, artifacts will be minimized, intact tandem gene clusters will be constructed accurately, and insights into rapid evolution will become feasible. Copyright © 2016 Elsevier Ltd. All rights reserved.

