Menu
July 7, 2019

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

Whole-genome sequences are now available for many microbial species and clades, however, existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.


July 7, 2019

Quality scores for 32,000 genomes.

More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences.Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes.The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.


July 7, 2019

Complete genome determination and analysis of Acholeplasma oculi strain 19L, highlighting the loss of basic genetic features in the Acholeplasmataceae.

BACKGROUND: Acholeplasma oculi belongs to the Acholeplasmataceae family, comprising the genera Acholeplasma and ‘Candidatus Phytoplasma’. Acholeplasmas are ubiquitous saprophytic bacteria. Several isolates are derived from plants or animals, whereas phytoplasmas are characterised as intracellular parasitic pathogens of plant phloem and depend on insect vectors for their spread. The complete genome sequences for eight strains of this family have been resolved so far, all of which were determined depending on clone-based sequencing. RESULTS:The A. oculi strain 19L chromosome was sequenced using two independent approaches. The first approach comprised sequencing by synthesis (Illumina) in combination with Sanger sequencing, while single molecule real time sequencing (PacBio) was used in the second. The genome was determined to be 1,587,120bp in size. Sequencing by synthesis resulted in six large genome fragments, while the single molecule real time sequencing approach yielded one circular chromosome sequence. High-quality sequences were obtained by both strategies differing in six positions, which are interpreted as reliable variations present in the culture population. Our genome analysis revealed 1,471 protein-coding genes and highlighted the absence of the F1FO-type Na+ ATPase system and GroEL/ES chaperone. Comparison of the four available Acholeplasma sequences revealed a core-genome encoding 703 proteins and a pan-genome of 2,867 proteins. CONCLUSIONS:The application of two state-of-the-art sequencing technologies highlights the potential of single molecule real time sequencing for complete genome determination. Comparative genome analyses revealed that the process of losing particular basic genetic features during genome reduction occurs in both genera, as indicated for several phytoplasma strains and at least A. oculi. The loss of the F1FO-type Na+ ATPase system may separate Acholeplasmataceae from other Mollicutes, while the loss of those genes encoding the chaperone GroEL/ES is not a rare exception in this bacterial class.


July 7, 2019

Global phylogenomic analysis of nonencapsulated Streptococcus pneumoniae reveals a deep-branching classic lineage that is distinct from multiple sporadic lineages.

The surrounding capsule of Streptococcus pneumoniae has been identified as a major virulence factor and is targeted by pneumococcal conjugate vaccines (PCV). However, nonencapsulated S. pneumoniae (non-Ec-Sp) have also been isolated globally, mainly in carriage studies. It is unknown if non-Ec-Sp evolve sporadically, if they have high antibiotic nonsusceptiblity rates and a unique, specific gene content. Here, whole-genome sequencing of 131 non-Ec-Sp isolates sourced from 17 different locations around the world was performed. Results revealed a deep-branching classic lineage that is distinct from multiple sporadic lineages. The sporadic lineages clustered with a previously sequenced, global collection of encapsulated S. pneumoniae (Ec-Sp) isolates while the classic lineage is comprised mainly of the frequently identified multilocus sequences types (STs) ST344 (n = 39) and ST448 (n = 40). All ST344 and nine ST448 isolates had high nonsusceptiblity rates to ß-lactams and other antimicrobials. Analysis of the accessory genome reveals that the classic non-Ec-Sp contained an increased number of mobile elements, than Ec-Sp and sporadic non-Ec-Sp. Performing adherence assays to human epithelial cells for selected classic and sporadic non-Ec-Sp revealed that the presence of a integrative conjugative element (ICE) results in increased adherence to human epithelial cells (P = 0.005). In contrast, sporadic non-Ec-Sp lacking the ICE had greater growth in vitro possibly resulting in improved fitness. In conclusion, non-Ec-Sp isolates from the classic lineage have evolved separately. They have spread globally, are well adapted to nasopharyngeal carriage and are able to coexist with Ec-Sp. Due to continued use of PCV, non-Ec-Sp may become more prevalent. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Complete genome sequence of Bifidobacterium longum 105-A, a strain with high transformation efficiency.

Bifidobacterium longum 105-A shows high transformation efficiency and allows for the generation of gene knockout mutants through homologous recombination. Here, we report the complete genome sequence of strain 105-A. Genes encoding at least four putative restriction-modification systems were found in this genome, which might contribute to its transformation efficiency. Copyright © 2014 Kanesaki et al.


July 7, 2019

Draft genome sequence of a metabolically diverse Antarctic supraglacial stream organism, Polaromonas sp. strain CG9_12, determined using Pacific Biosciences Single-Molecule Real-Time Sequencing Technology.

Polaromonas species are found in a diversity of environments and are particularly common in icy ecosystems. Polaromonas sp. strain CG9_12 is an aerobic, Gram-negative, catalase-positive, white-pigmented bacterium of the Proteobacteria phylum. Here, we present the draft genome sequence of Polaromonas sp. strain CG9_12, isolated from an Antarctic supraglacial stream. Copyright © 2014 Smith et al.


July 7, 2019

Potential impact on kidney infection: a whole-genome analysis of Leptospira santarosai serovar Shermani.

Leptospira santarosai serovar Shermani is the most frequently encountered serovar, and it causes leptospirosis and tubulointerstitial nephritis in Taiwan. This study aims to complete the genome sequence of L. santarosai serovar Shermani and analyze the transcriptional responses of L. santarosai serovar Shermani to renal tubular cells. To assemble this highly repetitive genome, we combined reads that were generated from four next-generation sequencing platforms by using hybrid assembly approaches to finish two-chromosome contiguous sequences without gaps by validating the data with optical restriction maps and Sanger sequencing. Whole-genome comparison studies revealed a 28-kb region containing genes that encode transposases and hypothetical proteins in L. santarosai serovar Shermani, but this region is absent in other pathogenic Leptospira spp. We found that lipoprotein gene expression in both L. santarosai serovar Shermani and L. interrogans serovar Copenhageni were upregulated upon interaction with renal tubular cells, and LSS19962, a L. santarosai serovar Shermani-specific gene within a 28-kb region that encodes hypothetical proteins, was upregulated in L. santarosai serovar Shermani-infected renal tubular cells. Lipoprotein expression during leptospiral infection might facilitate the interactions of leptospires within kidneys. The availability of the whole-genome sequence of L. santarosai serovar Shermani would make it the first completed sequence of this species, and its comparison with that of other Leptospira spp. may provide invaluable information for further studies in leptospiral pathogenesis.


July 7, 2019

Complete genome sequences of Bordetella pertussis isolates B1917 and B1920, representing two predominant global lineages.

Bordetella pertussis is the causative agent of pertussis, a disease which has resurged despite vaccination. We report the complete, annotated genomes of isolates B1917 and B1920, representing two lineages predominating globally in the last 50 years. The B1917 lineage has been associated with the resurgence of pertussis in the 1990s. Copyright © 2014 Bart et al.


July 7, 2019

Dissemination of cephalosporin resistance genes between Escherichia coli strains from farm animals and humans by specific plasmid lineages.

Third-generation cephalosporins are a class of ß-lactam antibiotics that are often used for the treatment of human infections caused by Gram-negative bacteria, especially Escherichia coli. Worryingly, the incidence of human infections caused by third-generation cephalosporin-resistant E. coli is increasing worldwide. Recent studies have suggested that these E. coli strains, and their antibiotic resistance genes, can spread from food-producing animals, via the food-chain, to humans. However, these studies used traditional typing methods, which may not have provided sufficient resolution to reliably assess the relatedness of these strains. We therefore used whole-genome sequencing (WGS) to study the relatedness of cephalosporin-resistant E. coli from humans, chicken meat, poultry and pigs. One strain collection included pairs of human and poultry-associated strains that had previously been considered to be identical based on Multi-Locus Sequence Typing, plasmid typing and antibiotic resistance gene sequencing. The second collection included isolates from farmers and their pigs. WGS analysis revealed considerable heterogeneity between human and poultry-associated isolates. The most closely related pairs of strains from both sources carried 1263 Single-Nucleotide Polymorphisms (SNPs) per Mbp core genome. In contrast, epidemiologically linked strains from humans and pigs differed by only 1.8 SNPs per Mbp core genome. WGS-based plasmid reconstructions revealed three distinct plasmid lineages (IncI1- and IncK-type) that carried cephalosporin resistance genes of the Extended-Spectrum Beta-Lactamase (ESBL)- and AmpC-types. The plasmid backbones within each lineage were virtually identical and were shared by genetically unrelated human and animal isolates. Plasmid reconstructions from short-read sequencing data were validated by long-read DNA sequencing for two strains. Our findings failed to demonstrate evidence for recent clonal transmission of cephalosporin-resistant E. coli strains from poultry to humans, as has been suggested based on traditional, low-resolution typing methods. Instead, our data suggest that cephalosporin resistance genes are mainly disseminated in animals and humans via distinct plasmids.


July 7, 2019

Inconsistency of phenotypic and genomic characteristics of Campylobacter fetus subspecies requires reevaluation of current diagnostics.

Classifications of the Campylobacter fetus subspecies fetus and venerealis were first described in 1959 and were based on the source of isolation (intestinal versus genital) and the ability of the strains to proliferate in the genital tract of cows. Two phenotypic assays (1% glycine tolerance and H2S production) were described to differentiate the subspecies. Multiple molecular assays have been applied to differentiate the C. fetus subspecies, but none of these tests is consistent with the phenotypic identification methods. In this study, we defined the core genome and accessory genes of C. fetus, which are based on the closed genomes of five C. fetus strains. Phylogenetic analysis of the core genomes of 23 C. fetus strains of the two subspecies showed a division into two clusters. The phylogenetic core genome clusters were not consistent with the phenotypic classifications of the C. fetus subspecies. However, they were consistent with the molecular characteristics of the strains, which were determined by multilocus sequence typing, sap typing, and the presence/absence of insertion sequences and a type I restriction modification system. The similarity of the genome characteristics of three of the phenotypically defined C. fetus subsp. fetus strains to C. fetus subsp. venerealis strains, when considering the core genome and accessory genes, requires a critical evaluation of the clinical relevance of C. fetus subspecies identification by phenotypic assays. Copyright © 2014, American Society for Microbiology. All Rights Reserved.


July 7, 2019

The DDBJ Japanese Genotype-phenotype archive for genetic and phenotypic human data.

The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency. DDBJ Center provides the JGA database system which securely stores genotype and phenotype data collected from individuals whose consent agreements authorize data release only for specific research use. NBDC has established guidelines and policies for sharing human-derived data and reviews data submission and usage requests from researchers. In addition to the JGA project, DDBJ Center develops Semantic Web technologies for data integration and sharing in collaboration with the Database Center for Life Science. This paper describes the overview of the JGA project, updates to the DDBJ databases, and services for data retrieval, analysis and integration. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Comparative genomics of the Campylobacter lari group.

The Campylobacter lari group is a phylogenetic clade within the epsilon subdivision of the Proteobacteria and is part of the thermotolerant Campylobacter spp., a division within the genus that includes the human pathogen Campylobacter jejuni. The C. lari group is currently composed of five species (C. lari, Campylobacter insulaenigrae, Campylobacter volucris, Campylobacter subantarcticus, and Campylobacter peloridis), as well as a group of strains termed the urease-positive thermophilic Campylobacter (UPTC) and other C. lari-like strains. Here we present the complete genome sequences of 11 C. lari group strains, including the five C. lari group species, four UPTC strains, and a lari-like strain isolated in this study. The genome of C. lari subsp. lari strain RM2100 was described previously. Analysis of the C. lari group genomes indicates that this group is highly related at the genome level. Furthermore, these genomes are strongly syntenic with minor rearrangements occurring only in 4 of the 12 genomes studied. The C. lari group can be bifurcated, based on the flagella and flagellar modification genes. Genomic analysis of the UPTC strains indicated that these organisms are variable but highly similar, closely related to but distinct from C. lari. Additionally, the C. lari group contains multiple genes encoding hemagglutination domain proteins, which are either contingency genes or linked to conserved contingency genes. Many of the features identified in strain RM2100, such as major deficiencies in amino acid biosynthesis and energy metabolism, are conserved across all 12 genomes, suggesting that these common features may play a role in the association of the C. lari group with coastal environments and watersheds. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2014. This work is written by US Government employees and is in the public domain in the US.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.