Menu
July 7, 2019

Discovery and genotyping of novel sequence insertions in many sequenced individuals

Motivation: Despite recent advances in algorithms design to characterize structural variation using high-throughput short read sequencing (HTS) data, characterization of novel sequence insertions longer than the average read length remains a challenging task. This is mainly due to both computational difficulties and the complexities imposed by genomic repeats in generating reliable assemblies to accurately detect both the sequence content and the exact location of such insertions. Additionally, de novo genome assembly algorithms typically require a very high depth of coverage, which may be a limiting factor for most genome studies. Therefore, characterization of novel sequence insertions is not a routine part of most sequencing projects. There are only a handful of algorithms that are specifically developed for novel sequence insertion discovery that can bypass the need for the whole genome de novo assembly. Still, most such algorithms rely on high depth of coverage, and to our knowledge there is only one method (PopIns) that can use multi-sample data to “collectively” obtain a very high coverage dataset to accurately find insertions common in a given population. Result: Here, we present Pamir, a new algorithm to efficiently and accurately discover and genotype novel sequence insertions using either single or multiple genome sequencing datasets. Pamir is able to detect breakpoint locations of the insertions and calculate their zygosity (i.e. heterozygous versus homozygous) by analyzing multiple sequence signatures, matching one-end-anchored sequences to small-scale de novo assemblies of unmapped reads, and conducting strand-aware local assembly. We test the efficacy of Pamir on both simulated and real data, and demonstrate its potential use in accurate and routine identification of novel sequence insertions in genome projects. Availability and implementation: Pamir is available at https://github.com/vpc-ccg/pamir. Contact:fhach@sfu.ca, prostatecentre.com or calkan@cs.bilkent.edu.tr Supplementary information:Supplementary data are available at Bioinformatics online.


July 7, 2019

Evidence for contemporary switching of the O-antigen gene cluster between Shiga toxin-producing Escherichia coli strains colonizing cattle.

Shiga toxin-producing Escherichia coli (STEC) comprise a group of zoonotic enteric pathogens with ruminants, especially cattle, as the main reservoir. O-antigens are instrumental for host colonization and bacterial niche adaptation. They are highly immunogenic and, therefore, targeted by the adaptive immune system. The O-antigen is one of the most diverse bacterial cell constituents and variation not only exists between different bacterial species, but also between individual isolates/strains within a single species. We recently identified STEC persistently infecting cattle and belonging to the different serotypes O156:H25 (n = 21) and O182:H25 (n = 15) that were of the MLST sequence types ST300 or ST688. These STs differ by a single nucleotide in purA only. Fitness-, virulence-associated genome regions, and CRISPR/CAS (clustered regularly interspaced short palindromic repeats/CRISPR associated sequence) arrays of these STEC O156:H25 and O182:H25 isolates were highly similar, and identical genomic integration sites for the stx converting bacteriophages and the core LEE, identical Shiga toxin converting bacteriophage genes for stx1a, identical complete LEE loci, and identical sets of chemotaxis and flagellar genes were identified. In contrast to this genomic similarity, the nucleotide sequences of the O-antigen gene cluster (O-AGC) regions between galF and gnd and very few flanking genes differed fundamentally and were specific for the respective serotype. Sporadic aEPEC O156:H8 isolates (n = 5) were isolated in temporal and spatial proximity. While the O-AGC and the corresponding 5′ and 3′ flanking regions of these aEPEC isolates were identical to the respective region in the STEC O156:H25 isolates, the core genome, the virulence associated genome regions and the CRISPR/CAS elements differed profoundly. Our cumulative epidemiological and molecular data suggests a recent switch of the O-AGC between isolates with O156:H8 strains having served as DNA donors. Such O-antigen switches can affect the evaluation of a strain’s pathogenic and virulence potential, suggesting that NGS methods might lead to a more reliable risk assessment.


July 7, 2019

Comparative genomic analysis reveals genetic features related to the virulence of Bacillus cereus FORC_013.

Bacillus cereus is well known as a gastrointestinal pathogen that causes food-borne illness. In the present study, we sequenced the complete genome of B. cereus FORC_013 isolated from fried eel in South Korea. To extend our understanding of the genomic characteristics of FORC_013, we conducted a comparative analysis with the published genomes of other B. cereus strains.We fully assembled the single circular chromosome (5,418,913 bp) and one plasmid (259,749 bp); 5511 open reading frames (ORFs) and 283 ORFs were predicted for the chromosome and plasmid, respectively. Moreover, we detected that the enterotoxin (NHE, HBL, CytK) induces food-borne illness with diarrheal symptom, and that the pleiotropic regulator, along with other virulence factors, plays a role in surviving and biofilm formation. Through comparative analysis using the complete genome sequence of B. cereus FORC_013, we identified both positively selected genes related to virulence regulation and 224 strain-specific genes of FORC_013.Through genome analysis of B. cereus FORC_013, we identified multiple virulence factors that may contribute to pathogenicity. These results will provide insight into further studies regarding B. cereus pathogenesis mechanism at the genomic level.


July 7, 2019

Evidence for the evolutionary steps leading to mecA-mediated ß-lactam resistance in staphylococci.

The epidemiologically most important mechanism of antibiotic resistance in Staphylococcus aureus is associated with mecA-an acquired gene encoding an extra penicillin-binding protein (PBP2a) with low affinity to virtually all ß-lactams. The introduction of mecA into the S. aureus chromosome has led to the emergence of methicillin-resistant S. aureus (MRSA) pandemics, responsible for high rates of mortality worldwide. Nonetheless, little is known regarding the origin and evolution of mecA. Different mecA homologues have been identified in species belonging to the Staphylococcus sciuri group representing the most primitive staphylococci. In this study we aimed to identify evolutionary steps linking these mecA precursors to the ß-lactam resistance gene mecA and the resistance phenotype. We sequenced genomes of 106 S. sciuri, S. vitulinus and S. fleurettii strains and determined their oxacillin susceptibility profiles. Single-nucleotide polymorphism (SNP) analysis of the core genome was performed to assess the genetic relatedness of the isolates. Phylogenetic analysis of the mecA gene homologues and promoters was achieved through nucleotide/amino acid sequence alignments and mutation rates were estimated using a Bayesian analysis. Furthermore, the predicted structure of mecA homologue-encoded PBPs of oxacillin-susceptible and -resistant strains were compared. We showed for the first time that oxacillin resistance in the S. sciuri group has emerged multiple times and by a variety of different mechanisms. Development of resistance occurred through several steps including structural diversification of the non-binding domain of native PBPs; changes in the promoters of mecA homologues; acquisition of SCCmec and adaptation of the bacterial genetic background. Moreover, our results suggest that it was exposure to ß-lactams in human-created environments that has driven evolution of native PBPs towards a resistance determinant. The evolution of ß-lactam resistance in staphylococci highlights the numerous resources available to bacteria to adapt to the selective pressure of antibiotics.


July 7, 2019

Staphylococcus aureus CC395 harbours a novel composite staphylococcal cassette chromosome mec element.

CoNS species are likely reservoirs of the staphylococcal cassette chromosome mec (SCC mec ) in Staphylococcus aureus . S . aureus CC395 is unique as it is capable of exchanging DNA with CoNS via bacteriophages, which are also known to mediate transfer of SCC mec .To analyse the structure and putative origin of the SCC mec element in S . aureus CC395.The only MRSA CC395 strain described in the literature, JS395, was subjected to WGS, and its SCC mec element was compared with those found in CoNS species and other S. aureus strains.JS395 was found to carry an unusually large 88 kb composite SCC mec element. The 33 kb region downstream of orfX harboured a type V SCC mec element and a CRISPR locus, which was most similar to those found in the CoNS species Staphylococcus capitis and Staphylococcus schleiferi . A 55 kb SCC element was identified downstream of the type V SCC mec element and contained a mercury resistance region found in the composite SCC element of some Staphylococcus epidermidis and S . aureus strains, an integrated S . aureus plasmid containing genes for the detoxification of cadmium and arsenic, and a stretch of genes that was partially similar to the type IVg SCC mec element found in a bovine S . aureus strain.The size and complexity of the SCC mec element support the idea that CC395 is highly prone to DNA uptake from CoNS. Thus CC395 may serve as an entry point for SCC mec and SCC structures into S . aureus .


July 7, 2019

Chromosomal 16S ribosomal RNA methyltransferase RmtE1 in Escherichia coli sequence type 448.

We identified rmtE1, an uncommon 16S ribosomal methyltransferase gene, in an aminoglycoside- and cephalosporin-resistant Escherichia coli sequence type 448 clinical strain co-harboring blaCMY-2. Long-read sequencing revealed insertion of a 101,257-bp fragment carrying both resistance genes to the chromosome. Our findings underscore E. coli sequence type 448 as a potential high-risk multidrug-resistant clone.


July 7, 2019

Virulence and genomic feature of a virulent Klebsiella pneumoniae sequence type 14 strain of serotype K2 harboring blaNDM-5 in China.

The objective of this study was to reveal the molecular mechanism involved in carbapenem resistance and virulence of a K2 Klebsiella pneumoniae clinical isolate 24835. The virulence of the strain was determined by in vitro and in vivo methods. The de novo whole-genome sequencing technology and molecular biology methods were used to analyze the genomic features associated with the carbapenem resistance and virulence of K. pneumoniae 24835. Strain 24835 was highly resistant to carbapenems and belonged to ST14, exhibited hypermucoviscous and unique K2-aerobactin-kfu-rmpA positive phenotype. As the only carbapenemase gene in strain 24835, blaNDM-5 was located on a 46-kb IncX3 self-transmissible plasmid, which is a very close relation of pNDM-MGR194 from India. Genetic context of blaNDM-5 in strain 24835 was closely related to those on IncX3 plasmids in various Enterobacteriaceae species in China. The combination of multiple virulence genes may work together to confer the relative higher virulence in K. pneumoniae 24835. Significantly increased resistance to serum killing and mice mortality were found in the virulent New Delhi metallo-ß-lactamase (NDM)-producing K. pneumoniae strain compared to the other NDM-producing K. pneumoniae strain. Our study provides basic information of phenotypic and genomic features of K. pneumoniae 24835, a strain displaying carbapenem resistance and relatively high level of virulence. These findings are concerning for the potential of NDM-like genes to disseminate among virulent K. pneumoniae isolates.


July 7, 2019

CLOVE: classification of genomic fusions into structural variation events.

A precise understanding of structural variants (SVs) in DNA is important in the study of cancer and population diversity. Many methods have been designed to identify SVs from DNA sequencing data. However, the problem remains challenging because existing approaches suffer from low sensitivity, precision, and positional accuracy. Furthermore, many existing tools only identify breakpoints, and so not collect related breakpoints and classify them as a particular type of SV. Due to the rapidly increasing usage of high throughput sequencing technologies in this area, there is an urgent need for algorithms that can accurately classify complex genomic rearrangements (involving more than one breakpoint or fusion).We present CLOVE, an algorithm for integrating the results of multiple breakpoint or SV callers and classifying the results as a particular SV. CLOVE is based on a graph data structure that is created from the breakpoint information. The algorithm looks for patterns in the graph that are characteristic of more complex rearrangement types. CLOVE is able to integrate the results of multiple callers, producing a consensus call.We demonstrate using simulated and real data that re-classified SV calls produced by CLOVE improve on the raw call set of existing SV algorithms, particularly in terms of accuracy. CLOVE is freely available from http://www.github.com/PapenfussLab .


July 7, 2019

Genome graphs

There is increasing recognition that a single, monoploid reference genome is a poor universal reference structure for human genetics, because it represents only a tiny fraction of human variation. Adding this missing variation results in a structure that can be described as a mathematical graph: a genome graph. We demonstrate that, in comparison to the existing reference genome (GRCh38), genome graphs can substantially improve the fractions of reads that map uniquely and perfectly. Furthermore, we show that this fundamental simplification of read mapping transforms the variant calling problem from one in which many non-reference variants must be discovered de-novo to one in which the vast majority of variants are simply re-identified within the graph. Using standard benchmarks as well as a novel reference-free evaluation, we show that a simplistic variant calling procedure on a genome graph can already call variants at least as well as, and in many cases better than, a state-of-the-art method on the linear human reference genome. We anticipate that graph-based references will supplant linear references in humans and in other applications where cohorts of sequenced individuals are available.


July 7, 2019

The MHC locus and genetic susceptibility to autoimmune and infectious diseases.

In the past 50 years, variants in the major histocompatibility complex (MHC) locus, also known as the human leukocyte antigen (HLA), have been reported as major risk factors for complex diseases. Recent advances, including large genetic screens, imputation, and analyses of non-additive and epistatic effects, have contributed to a better understanding of the shared and specific roles of MHC variants in different diseases. We review these advances and discuss the relationships between MHC variants involved in autoimmune and infectious diseases. Further work in this area will help to distinguish between alternative hypotheses for the role of pathogens in autoimmune disease development.


July 7, 2019

Complete genome sequence and bioinformatics analyses of Bacillus thuringiensis strain BM-BT15426.

This study aimed to investigate the genetic characteristics of Bacillus thuringiensis strain BM-BT15426.B. thuringiensis strain was identified by sequencing the PCR product (amplifying 16S rRNA gene) using ABI Prism 377 DNA Sequencer. The genome was sequenced using PacBio RS II sequencers and assembled de novo using HGAP. Also, further genome annotation was performed.The genome of B. thuringiensis strain BM-BT15426 has a length of 5,246,329 bp and contains 5409 predicted genes with an average G + C content of 35.40%. Three genes were involved in the “Infectious diseases: Amoebiasis” pathway. A total of 21 virulence factors and 9 antibiotic resistant genes were identified.The major pathogenic factors of B. thuringiensis strain BM-BT15426 were identified through complete genome sequencing and bioinformatics analyses which contributes to further study on pathogenic mechanism and phenotype of B. thuringiensis. Copyright © 2017 Elsevier Ltd. All rights reserved.


July 7, 2019

Comparative genomics of all three Campylobacter sputorum biovars and a novel cattle-associated C. sputorum clade.

Campylobacter sputorum is a non-thermotolerant campylobacter that is primarily isolated from food animals such as cattle and sheep. C. sputorum is also infrequently associated with human illness. Based on catalase and urease activity, three biovars are currently recognized within C. sputorum: bv. sputorum (catalase negative, urease negative), bv. fecalis (catalase positive, urease negative), and bv. paraureolyticus (catalase negative, urease positive). A multi-locus sequence typing (MLST) method was recently constructed for C. sputorum. MLST typing of several cattle-associated C. sputorum isolates suggested that they are members of a divergent C. sputorum clade. Although catalase positive, and thus technically bv. fecalis, the taxonomic position of these strains could not be determined solely by MLST. To further characterize C. sputorum, the genomes of four strains, representing all three biovars and the divergent clade, were sequenced to completion. Here we present a comparative genomic analysis of the four C. sputorum genomes. This analysis indicates that the three biovars and the cattle-associated strains are highly-related at the genome level with similarities in gene content. Furthermore, the four genomes are strongly syntenic with one or two minor inversions. However, substantial differences in gene content were observed among the three biovars. Finally, although the strain representing the cattle-associated isolates was shown to be C. sputorum, it is possible that this strain is a member of a novel C. sputorum subspecies; thus, these cattle-associated strains may form a second taxon within C. sputorum. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.


July 7, 2019

Whole genome characterization of a naturally occurring vancomycin-dependent Enterococcus faecium from a patient with bacteremia.

Vancomycin-dependent enterococci are a relatively uncommon phenotype recovered in the clinical laboratory. Recognition and recovery of these isolates are important, to provide accurate identification and susceptibility information to treating physicians. Herein, we describe the recovery of a vancomycin-dependent and revertant E. faecium isolates harboring vanB operon from a patient with bacteremia. Using whole genome sequencing, we found a unique single nucleotide polymorphism (S186N) in the D-Ala-D-Ala ligase (ddl) conferring vancomycin-dependency. Additionally, we found that a majority of in vitro revertants mutated outside ddl, with some strains harboring mutations in vanS, while others likely containing novel mechanisms of reversion. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

Comparison of pseudorabies virus China reference strain with emerging variants reveals independent virus evolution within specific geographic regions.

Pseudorabies virus (PRV) China reference strain Ea is genetically closely related to newly emerged variants; however, there is limited information about PRV Ea. Here, we compared PRV Ea with new variant strains by growth kinetics, genome sequencing, and protein expression analysis. Growth analysis showed that strain Ea forms smaller plaques than strain HNX. The full-length genome sequence of Ea revealed that it is clustered in the same subgroup as HNX. Ea and HNX strains exhibited similar extracellular virion protein polymorphisms, whereas strain Bartha expressed less VP26 and more GAPDH. In infected cells, strain Ea expressed high levels of IE180 protein, and Ea and HNX produced higher levels of UL21 protein than strain Bartha. These findings provide evidence that PRV China reference strain Ea is genetically closely related to the newly emerged variant strains, indicating that strain PRV China may have evolved independently leading to the emergence of a variant strain. Copyright © 2017 Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.