Menu
July 7, 2019

The hidden perils of read mapping as a quality assessment tool in genome sequencing.

This article provides a comparative analysis of the various methods of genome sequencing focusing on verification of the assembly quality. The results of a comparative assessment of various de novo assembly tools, as well as sequencing technologies, are presented using a recently completed sequence of the genome of Lactobacillus fermentum 3872. In particular, quality of assemblies is assessed by using CLC Genomics Workbench read mapping and Optical mapping developed by OpGen. Over-extension of contigs without prior knowledge of contig location can lead to misassembled contigs, even when commonly used quality indicators such as read mapping suggest that a contig is well assembled. Precautions must also be undertaken when using long read sequencing technology, which may also lead to misassembled contigs.


July 7, 2019

Genomic analysis of ST88 community-acquired methicillin resistant Staphylococcus aureus in Ghana.

The emergence and evolution of community-acquired methicillin resistant Staphylococcus aureus (CA-MRSA) strains in Africa is poorly understood. However, one particular MRSA lineage called ST88, appears to be rapidly establishing itself as an “African” CA-MRSA clone. In this study, we employed whole genome sequencing to provide more information on the genetic background of ST88 CA-MRSA isolates from Ghana and to describe in detail ST88 CA-MRSA isolates in comparison with other MRSA lineages worldwide.We first established a complete ST88 reference genome (AUS0325) using PacBio SMRT sequencing. We then used comparative genomics to assess relatedness among 17 ST88 CA-MRSA isolates recovered from patients attending Buruli ulcer treatment centres in Ghana, three non-African ST88s and 15 other MRSA lineages.We show that Ghanaian ST88 forms a discrete MRSA lineage (harbouring SCCmec-IV [2B]). Gene content analysis identified five distinct genomic regions enriched among ST88 isolates compared with the other S. aureus lineages. The Ghanaian ST88 isolates had only 658 core genome SNPs and there was no correlation between phylogeny and geography, suggesting the recent spread of this clone. The lineage was also resistant to multiple classes of antibiotics including ß-lactams, tetracycline and chloramphenicol.This study reveals that S. aureus ST88-IV is a recently emerging and rapidly spreading CA-MRSA clone in Ghana. The study highlights the capacity of small snapshot genomic studies to provide actionable public health information in resource limited settings. To our knowledge this is the first genomic assessment of the ST88 CA-MRSA clone.


July 7, 2019

AidP, a novel N-Acyl homoserine lactonase gene from Antarctic Planococcus sp.

Planococcus is a Gram-positive halotolerant bacterial genus in the phylum Firmicutes, commonly found in various habitats in Antarctica. Quorum quenching (QQ) is the disruption of bacterial cell-to-cell communication (known as quorum sensing), which has previously been described in mesophilic bacteria. This study demonstrated the QQ activity of a psychrotolerant strain, Planococcus versutus strain L10.15(T), isolated from a soil sample obtained near an elephant seal wallow in Antarctica. Whole genome analysis of this bacterial strain revealed the presence of an N-acyl homoserine lactonase, an enzyme that hydrolyzes the ester bond of the homoserine lactone of N-acyl homoserine lactone (AHLs). Heterologous gene expression in E. coli confirmed its functions for hydrolysis of AHLs, and the gene was designated as aidP (autoinducer degrading gene from Planococcus sp.). The low temperature activity of this enzyme suggested that it is a novel and uncharacterized class of AHL lactonase. This study is the first report on QQ activity of bacteria isolated from the polar regions.


July 7, 2019

A spontaneous mutation in kdsD, a biosynthesis gene for 3 Deoxy-D-manno-Octulosonic Acid, occurred in a ciprofloxacin resistant strain of Francisella tularensis and caused a high level of attenuation in murine models of tularemia.

Francisella tularensis, a gram-negative facultative intracellular bacterial pathogen, is the causative agent of tularemia and able to infect many mammalian species, including humans. Because of its ability to cause a lethal infection, low infectious dose, and aerosolizable nature, F. tularensis subspecies tularensis is considered a potential biowarfare agent. Due to its in vitro efficacy, ciprofloxacin is one of the antibiotics recommended for post-exposure prophylaxis of tularemia. In order to identify therapeutics that will be efficacious against infections caused by drug resistant select-agents and to better understand the threat, we sought to characterize an existing ciprofloxacin resistant (CipR) mutant in the Schu S4 strain of F. tularensis by determining its phenotypic characteristics and sequencing the chromosome to identify additional genetic alterations that may have occurred during the selection process. In addition to the previously described genetic alterations, the sequence of the CipR mutant strain revealed several additional mutations. Of particular interest was a frameshift mutation within kdsD which encodes for an enzyme necessary for the production of 3-Deoxy-D-manno-Octulosonic Acid (KDO), an integral component of the lipopolysaccharide (LPS). A kdsD mutant was constructed in the Schu S4 strain. Although it was not resistant to ciprofloxacin, the kdsD mutant shared many phenotypic characteristics with the CipR mutant, including growth defects under different conditions, sensitivity to hydrophobic agents, altered LPS profiles, and attenuation in multiple models of murine tularemia. This study demonstrates that the KdsD enzyme is essential for Francisella virulence and may be an attractive therapeutic target for developing novel medical countermeasures.


July 7, 2019

Genome sequencing and analysis of Talaromyces pinophilus provide insights into biotechnological applications.

Species from the genus Talaromyces produce useful biomass-degrading enzymes and secondary metabolites. However, these enzymes and secondary metabolites are still poorly understood and have not been explored in depth because of a lack of comprehensive genetic information. Here, we report a 36.51-megabase genome assembly of Talaromyces pinophilus strain 1-95, with coverage of nine scaffolds of eight chromosomes with telomeric repeats at their ends and circular mitochondrial DNA. In total, 13,472 protein-coding genes were predicted. Of these, 803 were annotated to encode enzymes that act on carbohydrates, including 39 cellulose-degrading and 24 starch-degrading enzymes. In addition, 68 secondary metabolism gene clusters were identified, mainly including T1 polyketide synthase genes and nonribosomal peptide synthase genes. Comparative genomic analyses revealed that T. pinophilus 1-95 harbors more biomass-degrading enzymes and secondary metabolites than other related filamentous fungi. The prediction of the T. pinophilus 1-95 secretome indicated that approximately 50% of the biomass-degrading enzymes are secreted into the extracellular environment. These results expanded our genetic knowledge of the biomass-degrading enzyme system of T. pinophilus and its biosynthesis of secondary metabolites, facilitating the cultivation of T. pinophilus for high production of useful products.


July 7, 2019

Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.


July 7, 2019

Fungal volatile compounds induce production of the secondary metabolite Sodorifen in Serratia plymuthica PRI-2C.

The ability of bacteria and fungi to communicate with each other is a remarkable aspect of the microbial world. It is recognized that volatile organic compounds (VOCs) act as communication signals, however the molecular responses by bacteria to fungal VOCs remain unknown. Here we perform transcriptomics and proteomics analyses of Serratia plymuthica PRI-2C exposed to VOCs emitted by the fungal pathogen Fusarium culmorum. We find that the bacterium responds to fungal VOCs with changes in gene and protein expression related to motility, signal transduction, energy metabolism, cell envelope biogenesis, and secondary metabolite production. Metabolomic analysis of the bacterium exposed to the fungal VOCs, gene cluster comparison, and heterologous co-expression of a terpene synthase and a methyltransferase revealed the production of the unusual terpene sodorifen in response to fungal VOCs. These results strongly suggest that VOCs are not only a metabolic waste but important compounds in the long-distance communication between fungi and bacteria.


July 7, 2019

Phenotypic diversity and genotypic flexibility of Burkholderia cenocepacia during long-term chronic infection of cystic fibrosis lungs.

Chronic bacterial infections of the lung are the leading cause of morbidity and mortality in cystic fibrosis patients. Tracking bacterial evolution during chronic infections can provide insights into how host selection pressures-including immune responses and therapeutic interventions-shape bacterial genomes. We carried out genomic and phenotypic analyses of 215 serially collected Burkholderia cenocepacia isolates from 16 cystic fibrosis patients, spanning a period of 2-20 yr and a broad range of epidemic lineages. Systematic phenotypic tests identified longitudinal bacterial series that manifested progressive changes in liquid media growth, motility, biofilm formation, and acute insect virulence, but not in mucoidy. The results suggest that distinct lineages follow distinct evolutionary trajectories during lung infection. Pan-genome analysis identified 10,110 homologous gene clusters present only in a subset of strains, including genes restricted to different molecular types. Our phylogenetic analysis based on 2148 orthologous gene clusters from all isolates is consistent with patient-specific clades. This suggests that initial colonization of patients was likely by individual strains, followed by subsequent diversification. Evidence of clonal lineages shared by some patients was observed, suggesting inter-patient transmission. We observed recurrent gene losses in multiple independent longitudinal series, including complete loss of Chromosome III and deletions on other chromosomes. Recurrently observed loss-of-function mutations were associated with decreases in motility and biofilm formation. Together, our study provides the first comprehensive genome-phenome analyses of B. cenocepacia infection in cystic fibrosis lungs and serves as a valuable resource for understanding the genomic and phenotypic underpinnings of bacterial evolution.© 2017 Lee et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Fast and accurate de novo genome assembly from long uncorrected reads.

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.© 2017 Vaser et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

HINGE: long-read assembly achieves optimal repeat resolution.

Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce misassemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve optimal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding “hinges” to reads for constructing an overlap graph where only unresolvable repeats are merged. As a result, HINGE combines the error resilience of overlap-based assemblers with repeat-resolution capabilities of de Bruijn graph assemblers. HINGE was evaluated on the long-read bacterial data sets from the NCTC project. HINGE produces more finished assemblies than Miniasm and the manual pipeline of NCTC based on the HGAP assembler and Circlator. HINGE also allows us to identify 40 data sets where unresolvable repeats prevent the reliable construction of a unique finished assembly. In these cases, HINGE outputs a visually interpretable assembly graph that encodes all possible finished assemblies consistent with the reads, while other approaches such as the NCTC pipeline and FALCON either fragment the assembly or resolve the ambiguity arbitrarily.© 2017 Kamath et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy. © 2017 Zimin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism.

Plasmopara viticola causes downy mildew disease of grapevine which is one of the most devastating diseases of viticulture worldwide. Here we report a 101.3?Mb whole genome sequence of P. viticola isolate ‘JL-7-2’ obtained by a combination of Illumina and PacBio sequencing technologies. The P. viticola genome contains 17,014 putative protein-coding genes and has ~26% repetitive sequences. A total of 1,301 putative secreted proteins, including 100 putative RXLR effectors and 90 CRN effectors were identified in this genome. In the secretome, 261 potential pathogenicity genes and 95 carbohydrate-active enzymes were predicted. Transcriptional analysis revealed that most of the RXLR effectors, pathogenicity genes and carbohydrate-active enzymes were significantly up-regulated during infection. Comparative genomic analysis revealed that P. viticola evolved independently from the Arabidopsis downy mildew pathogen Hyaloperonospora arabidopsidis. The availability of the P. viticola genome provides a valuable resource not only for comparative genomic analysis and evolutionary studies among oomycetes, but also enhance our knowledge on the mechanism of interactions between this biotrophic pathogen and its host.


July 7, 2019

The recent emergence in hospitals of multidrug-resistant community-associated sequence type 1 and spa type t127 methicillin-resistant Staphylococcus aureus investigated by whole-genome sequencing: Implications for screening.

Community-associated spa type t127/t922 methicillin-resistant Staphylococcus aureus (MRSA) prevalence increased from 1%-7% in Ireland between 2010-2015. This study tracked the spread of 89 such isolates from June 2013-June 2016. These included 78 healthcare-associated and 11 community associated-MRSA isolates from a prolonged hospital outbreak (H1) (n = 46), 16 other hospitals (n = 28), four other healthcare facilities (n = 4) and community-associated sources (n = 11). Isolates underwent antimicrobial susceptibility testing, DNA microarray profiling and whole-genome sequencing. Minimum spanning trees were generated following core-genome multilocus sequence typing and pairwise single nucleotide variation (SNV) analysis was performed. All isolates were sequence type 1 MRSA staphylococcal cassette chromosome mec type IV (ST1-MRSA-IV) and 76/89 were multidrug-resistant. Fifty isolates, including 40/46 from H1, were high-level mupirocin-resistant, carrying a conjugative 39 kb iles2-encoding plasmid. Two closely related ST1-MRSA-IV strains (I and II) and multiple sporadic strains were identified. Strain I isolates (57/89), including 43/46 H1 and all high-level mupirocin-resistant isolates, exhibited =80 SNVs. Two strain I isolates from separate H1 healthcare workers differed from other H1/strain I isolates by 7-47 and 12-53 SNVs, respectively, indicating healthcare worker involvement in this outbreak. Strain II isolates (19/89), including the remaining H1 isolates, exhibited =127 SNVs. For each strain, the pairwise SNVs exhibited by healthcare-associated and community-associated isolates indicated recent transmission of ST1-MRSA-IV within and between multiple hospitals, healthcare facilities and communities in Ireland. Given the interchange between healthcare-associated and community-associated isolates in hospitals, the risk factors that inform screening for MRSA require revision.


July 7, 2019

Elucidation of quantitative structural diversity of remarkable rearrangement regions, shufflons, in IncI2 plasmids.

A multiple DNA inversion system, the shufflon, exists in incompatibility (Inc) I1 and I2 plasmids. The shufflon generates variants of the PilV protein, a minor component of the thin pilus. The shufflon is one of the most difficult regions for de novo genome assembly because of its structural diversity even in an isolated bacterial clone. We determined complete genome sequences, including those of IncI2 plasmids carrying mcr-1, of three Escherichia coli strains using single-molecule, real-time (SMRT) sequencing and Illumina sequencing. The sequences assembled using only SMRT sequencing contained misassembled regions in the shufflon. A hybrid analysis using SMRT and Illumina sequencing resolved the misassembled region and revealed that the three IncI2 plasmids, excluding the shufflon region, were highly conserved. Moreover, the abundance ratio of whole-shufflon structures could be determined by quantitative structural variation analysis of the SMRT data, suggesting that a remarkable heterogeneity of whole-shufflon structural variations exists in IncI2 plasmids. These findings indicate that remarkable rearrangement regions should be validated using both long-read and short-read sequencing data and that the structural variation of PilV in the shufflon might be closely related to phenotypic heterogeneity of plasmid-mediated transconjugation involved in horizontal gene transfer even in bacterial clonal populations.


July 7, 2019

Complex routes of nosocomial vancomycin-resistant Enterococcus faecium transmission revealed by genome sequencing.

Vancomycin-resistant Enterococcus faecium (VREfm) is a leading cause of nosocomial infection. Here, we describe the utility of whole-genome sequencing in defining nosocomial VREfm transmission.A retrospective study at a single hospital in the United Kingdom identified 342 patients with E. faecium bloodstream infection over 7 years. Of these, 293 patients had a stored isolate and formed the basis for the study. The first stored isolate from each case was sequenced (200 VREfm [197 vanA, 2 vanB, and 1 isolate containing both vanA and vanB], 93 vancomycin-susceptible E. faecium) and epidemiological data were collected. Genomes were also available for E. faecium associated with bloodstream infections in 15 patients in neighboring hospitals, and 456 patients across the United Kingdom and Ireland.The majority of infections in the 293 patients were hospital-acquired (n = 249) or healthcare-associated (n = 42). Phylogenetic analysis showed that 291 of 293 isolates resided in a hospital-associated clade that contained numerous discrete clusters of closely related isolates, indicative of multiple introductions into the hospital followed by clonal expansion associated with transmission. Fine-scale analysis of 6 exemplar phylogenetic clusters containing isolates from 93 patients (32%) identified complex transmission routes that spanned numerous wards and years, extending beyond the detection of conventional infection control. These contained both vancomycin-resistant and -susceptible isolates. We also identified closely related isolates from patients at Cambridge University Hospitals NHS Foundation Trust and regional and national hospitals, suggesting interhospital transmission.These findings provide important insights for infection control practice and signpost areas for interventions. We conclude that sequencing represents a powerful tool for the enhanced surveillance and control of nosocomial E. faecium transmission and infection.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.