Menu
September 22, 2019

PBHoover and CigarRoller: a method for confident haploid variant calling on Pacific Biosciences data and its application to heterogeneous population analysis

Motivation: Single Molecule Real-Time (SMRT) sequencing has important and underutilized advantages that amplification-based platforms lack. Lack of systematic error (e.g. GC-bias), complete de novo assembly (including large repetitive regions) without scaffolding, can be mentioned. SMRT sequencing, however suffers from high random error rate and low sequencing depth (older chemistries). Here, we introduce PBHoover, software that uses a heuristic calling algorithm in order to make base calls with high certainty in low coverage regions. This software is also capable of mixed population detection with high sensitivity. PBHoovertextquoterights CigarRoller attachment improves sequencing depth in low-coverage regions through CIGAR-string correction. Results: We tested both modules on 348 M.tuberculosis clinical isolates sequenced on C1 or C2 chemistries. On average, CigarRoller improved percentage of usable read count from 68.9% to 99.98% in C1 runs and from 50% to 99% in C2 runs. Using the greater depth provided by CigarRoller, PBHoover was able to make base and variant calls 99.95% concordant with Sanger calls (QV33). PBHoover also detected antibiotic-resistant subpopulations that went undetected by Sanger. Using C1 chemistry, subpopulations as small as 9% of the total colony can be detected by PBHoover. This provides the most sensitive amplification-free molecular method for heterogeneity analysis and is in line with phenotypic methodstextquoteright sensitivity. This sensitivity significantly improves with the greater depth and lower error rate of the newer chemistries. Availability and Implementation: Executables are freely available under GNU GPL v3+ at http://www.gitlab.com/LPCDRP/pbhoover and http://www.gitlab.com/LPCDRP/CigarRoller. PBHoover is also available on bioconda: https://anaconda.org/bioconda/pbhoover.


September 22, 2019

Potential survival and pathogenesis of a novel strain, Vibrio parahaemolyticus FORC_022, isolated from a soy sauce marinated crab by genome and transcriptome analyses.

Vibrio parahaemolyticus can cause gastrointestinal illness through consumption of seafood. Despite frequent food-borne outbreaks of V. parahaemolyticus, only 19 strains have subjected to complete whole-genome analysis. In this study, a novel strain of V. parahaemolyticus, designated FORC_022 (Food-borne pathogen Omics Research Center_022), was isolated from soy sauce marinated crabs, and its genome and transcriptome were analyzed to elucidate the pathogenic mechanisms. FORC_022 did not include major virulence factors of thermostable direct hemolysin (tdh) and TDH-related hemolysin (trh). However, FORC_022 showed high cytotoxicity and had several V. parahaemolyticus islands (VPaIs) and other virulence factors, such as various secretion systems (types I, II, III, IV, and VI), in comparative genome analysis with CDC_K4557 (the most similar strain) and RIMD2210633 (genome island marker strain). FORC_022 harbored additional virulence genes, including accessory cholera enterotoxin, zona occludens toxin, and tight adhesion (tad) locus, compared with CDC_K4557. In addition, O3 serotype specific gene and the marker gene of pandemic O3:K6 serotype (toxRS) were detected in FORC_022. The expressions levels of genes involved in adherence and carbohydrate transporter were high, whereas those of genes involved in motility, arginine biosynthesis, and proline metabolism were low after exposure to crabs. Moreover, the virulence factors of the type III secretion system, tad locus, and thermolabile hemolysin were overexpressed. Therefore, the risk of foodborne-illness may be high following consumption of FORC_022 contaminated crab. These results provided molecular information regarding the survival and pathogenesis of V. parahaemolyticus FORC_022 strain in contaminated crab and may have applications in food safety.


September 22, 2019

Sequencing of Panax notoginseng genome reveals genes involved in disease resistance and ginsenoside biosynthesis

Background: Panax notoginseng is a traditional Chinese herb with high medicinal and economic value. There has been considerable research on the pharmacological activities of ginsenosides contained in Panax spp.; however, very little is known about the ginsenoside biosynthetic pathway. Results: We reported the first de novo genome of 2.36 Gb of sequences from P. notoginseng with 35,451 protein-encoding genes. Compared to other plants, we found notable gene family contraction of disease-resistance genes in P. notoginseng, but notable expansion for several ATP-binding cassette (ABC) transporter subfamilies, such as the Gpdr subfamily, indicating that ABCs might be an additional mechanism for the plant to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, we identified several key genes, their transcription factor binding sites and all their family members involved in the synthesis pathway of ginsenosides in P. notoginseng, including dammarenediol synthase, CYP716 and UGT71. Conclusions: The complete genome analysis of P. notoginseng, the first in genus Panax, will serve as an important reference sequence for improving breeding and cultivation of this important nutraceutical and medicinal but vulnerable plant species.


September 22, 2019

Complete genome sequence of Enterococcus durans KLDS6.0933, a potential probiotic strain with high cholesterol removal ability

Enterococci are commensal bacteria in the mammalian gastrointestinal tract which play an important role in the production of various fermented foods. Thus, certain enterococcal strains are commonly used as probiotics to confer health benefits to human and animals. Enterococcus durans KLDS6.0933 is a potential probiotic strain with high cholesterol removal ability, which was isolated from traditional naturally fermented cream in Inner Mongolia of China. To better understand the genetic basis of the probiotic properties of this strain, the whole-genome sequence was performed using the PacBio RSII platform.


September 22, 2019

The draft genomes of Elizabethkingia anophelis of equine origin are genetically similar to three isolates from human clinical specimens.

We report the isolation and characterization of two Elizabethkingia anophelis strains (OSUVM-1 and OSUVM-2) isolated from sources associated with horses in Oklahoma. Both strains appeared susceptible to fluoroquinolones and demonstrated high MICs to all cell wall active antimicrobials including vancomycin, along with aminoglycosides, fusidic acid, chloramphenicol, and tetracycline. Typical of the Elizabethkingia, both draft genomes contained multiple copies of ß-lactamase genes as well as genes predicted to function in antimicrobial efflux. Phylogenetic analysis of the draft genomes revealed that OSUVM-1 and OSUVM-2 differ by only 6 SNPs and are in a clade with 3 strains of Elizabethkingia anophelis that were responsible for human infections. These findings therefore raise the possibility that Elizabethkingia might have the potential to move between humans and animals in a manner similar to known zoonotic pathogens.


September 22, 2019

Integrating long-range connectivity information into de Bruijn graphs.

The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data.We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes.Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex.Supplementary data are available at Bioinformatics online.


September 22, 2019

Periodic variation of mutation rates in bacterial genomes associated with replication timing

The causes and consequences of spatiotemporal variation in mutation rates remain to be explored in nearly all organisms. Here we examine relationships between local mutation rates and replication timing in three bacterial species whose genomes have multiple chromosomes: Vibrio fischeri, Vibrio cholerae, and Burkholderia cenocepacia Following five mutation accumulation experiments with these bacteria conducted in the near absence of natural selection, the genomes of clones from each lineage were sequenced and analyzed to identify variation in mutation rates and spectra. In lineages lacking mismatch repair, base substitution mutation rates vary in a mirrored wave-like pattern on opposing replichores of the large chromosomes of V. fischeri and V. cholerae, where concurrently replicated regions experience similar base substitution mutation rates. The base substitution mutation rates on the small chromosome are less variable in both species but occur at similar rates to those in the concurrently replicated regions of the large chromosome. Neither nucleotide composition nor frequency of nucleotide motifs differed among regions experiencing high and low base substitution rates, which along with the inferred ~800-kb wave period suggests that the source of the periodicity is not sequence specific but rather a systematic process related to the cell cycle. These results support the notion that base substitution mutation rates are likely to vary systematically across many bacterial genomes, which exposes certain genes to elevated deleterious mutational load.IMPORTANCE That mutation rates vary within bacterial genomes is well known, but the detailed study of these biases has been made possible only recently with contemporary sequencing methods. We applied these methods to understand how bacterial genomes with multiple chromosomes, like those of Vibrio and Burkholderia, might experience heterogeneous mutation rates because of their unusual replication and the greater genetic diversity found on smaller chromosomes. This study captured thousands of mutations and revealed wave-like rate variation that is synchronized with replication timing and not explained by sequence context. The scale of this rate variation over hundreds of kilobases of DNA strongly suggests that a temporally regulated cellular process may generate wave-like variation in mutation risk. These findings add to our understanding of how mutation risk is distributed across bacterial and likely also eukaryotic genomes, owing to their highly conserved replication and repair machinery. Copyright © 2018 Dillon et al.


September 22, 2019

Identification of the DNA methyltransferases establishing the methylome of the cyanobacterium Synechocystis sp. PCC 6803.

DNA methylation in bacteria is important for defense against foreign DNA, but is also involved in DNA repair, replication, chromosome partitioning, and regulatory processes. Thus, characterization of the underlying DNA methyltransferases in genetically tractable bacteria is of paramount importance. Here, we characterized the methylome and orphan methyltransferases in the model cyanobacterium Synechocystis sp. PCC 6803. Single molecule real-time (SMRT) sequencing revealed four DNA methylation recognition sequences in addition to the previously known motif m5CGATCG, which is recognized by M.Ssp6803I. For three of the new recognition sequences, we identified the responsible methyltransferases. M.Ssp6803II, encoded by the sll0729 gene, modifies GGm4CC, M.Ssp6803III, encoded by slr1803, represents the cyanobacterial dam-like methyltransferase modifying Gm6ATC, and M.Ssp6803V, encoded by slr6095 on plasmid pSYSX, transfers methyl groups to the bipartite motif GGm6AN7TTGG/CCAm6AN7TCC. The remaining methylation recognition sequence GAm6AGGC is probably recognized by methyltransferase M.Ssp6803IV encoded by slr6050. M.Ssp6803III and M.Ssp6803IV were essential for the viability of Synechocystis, while the strains lacking M.Ssp6803I and M.Ssp6803V showed growth similar to the wild type. In contrast, growth was strongly diminished of the ?sll0729 mutant lacking M.Ssp6803II. These data provide the basis for systematic studies on the molecular mechanisms impacted by these methyltransferases.


September 22, 2019

A reference genome of the Chinese hamster based on a hybrid assembly strategy.

Accurate and complete genome sequences are essential in biotechnology to facilitate genome-based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short-read-based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina-based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28-fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up- and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research.© 2018 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals, Inc.


September 22, 2019

Sequencing of pT5282-CTXM, p13190-KPC and p30860-NR, and comparative genomics analysis of IncX8 plasmids.

This study proposes a replicon-based scheme for typing IncX plasmids into nine separately clustering subgroups, including IncX1a, IncX1ß and IncX2-8. The complete nucleotide sequences of three IncX8 plasmids, namely pT5282-CTXM and p30860-NR from Enterobacter cloacae and p13190-KPC from Klebsiella pneumoniae, were determined and were compared with two other previously sequenced IncX8 plasmids (pCAV1043-58 and pCAV1741-16). These five plasmids possessed conserved IncX8 backbones with limited genetic variation with respect to gene content and organisation, and each of them carried one or three accessory modules that harboured resistance markers and metabolic gene clusters as well as transposons, insertion sequence (IS)-based transposition units and miniature inverted repeat transposable elements (MITEs), indicating that the relatively small IncX8 backbones were able to integrate various foreign genetic contents. The resistance genes blaCTX-M-3 and blaTEM-1 (ß-lactam resistance), blaKPC-2 (carbapenem resistance) and ?blaTEM-1, and tet(A) (tetracycline resistance) and mph(E) (macrolide resistance) were found in pT5282-CTXM, p13190-KPC and pCAV1741-16, respectively, whilst p30860-NR and pCAV1043-58 carried no resistance genes. The data presented here provide an insight into the diversification and evolution history of IncX8 plasmids. Copyright © 2018 Elsevier B.V. and International Society of Chemotherapy. All rights reserved.


September 22, 2019

Tracing genomic divergence of Vibrio bacteria in the Harveyi clade.

The mechanism of bacterial speciation remains a topic of tremendous interest. To understand the ecological and evolutionary mechanisms of speciation in Vibrio bacteria, we analyzed the genomic dissimilarities between three closely related species in the so-called Harveyi clade of the genus Vibrio, V. campbellii, V. jasicida, and V. hyugaensis The analysis focused on strains isolated from diverse geographic locations over a long period of time. The results of phylogenetic analyses and calculations of average nucleotide identity (ANI) supported the classification of V. jasicida and V. hyugaensis into two species. These analyses also identified two well-supported clades in V. campbellii; however, strains from both clades were classified as members of the same species. Comparative analyses of the complete genome sequences of representative strains from the three species identified higher syntenic coverage between genomes of V. jasicida and V. hyugaensis than that between the genomes from the two V. campbellii clades. The results from comparative analyses of gene content between bacteria from the three species did not support the hypothesis that gene gain and/or loss contributed to their speciation. We also did not find support for the hypothesis that ecological diversification toward associations with marine animals contributed to the speciation of V. jasicida and V. hyugaensis Overall, based on the results obtained in this study, we propose that speciation in Harveyi clade species is a result of stochastic diversification of local populations, which was influenced by multiple evolutionary processes, followed by extinction events.IMPORTANCE To investigate the mechanisms underlying speciation in the genus Vibrio, we provided a well-assembled reference of genomes and performed systematic genomic comparisons among three evolutionarily closely related species. We resolved taxonomic ambiguities and identified genomic features separating the three species. Based on the study results, we propose a hypothesis explaining how species in the Harveyi clade of Vibrio bacteria diversified. Copyright © 2018 American Society for Microbiology.


September 22, 2019

Analysis of the draft genome of the red seaweed Gracilariopsis chorda provides insights into genome size evolution in Rhodophyta.

Red algae (Rhodophyta) underwent two phases of large-scale genome reduction during their early evolution. The red seaweeds did not attain genome sizes or gene inventories typical of other multicellular eukaryotes. We generated a high-quality 92.1 Mb draft genome assembly from the red seaweed Gracilariopsis chorda, including methylation and small (s)RNA data. We analyzed these and other Archaeplastida genomes to address three questions: 1) What is the role of repeats and transposable elements (TEs) in explaining Rhodophyta genome size variation, 2) what is the history of genome duplication and gene family expansion/reduction in these taxa, and 3) is there evidence for TE suppression in red algae? We find that the number of predicted genes in red algae is relatively small (4,803-13,125 genes), particularly when compared with land plants, with no evidence of polyploidization. Genome size variation is primarily explained by TE expansion with the red seaweeds having the largest genomes. Long terminal repeat elements and DNA repeats are the major contributors to genome size growth. About 8.3% of the G. chorda genome undergoes cytosine methylation among gene bodies, promoters, and TEs, and 71.5% of TEs contain methylated-DNA with 57% of these regions associated with sRNAs. These latter results suggest a role for TE-associated sRNAs in RNA-dependent DNA methylation to facilitate silencing. We postulate that the evolution of genome size in red algae is the result of the combined action of TE spread and the concomitant emergence of its epigenetic suppression, together with other important factors such as changes in population size.


September 22, 2019

Identification and characterization of conjugative plasmids that encode ciprofloxacin resistance in Salmonella.

This study aimed to characterize novel conjugative plasmids that encode transferrable ciprofloxacin resistance in Salmonella In this study, 157 non-duplicated Salmonella isolates were recovered from food products, 55 out of which were found to be resistant to ciprofloxacin. Interestingly, 37 out of the 55 (67%) CipRSalmonella isolates did not harbor any mutations in the Quinolone resistance determine regions (QRDR). Interestingly, six Salmonella isolates were shown to carry two novel types of conjugative plasmids that could transfer ciprofloxacin resistance phenotype to E. coli J53 (AziR). The first type belonged to the ~110kb IncFIB type conjugative plasmid carrying qnrB-bearing and aac(6′)-Ib-cr-bearing mobile elements. Transfer of the plasmid between E. coli or Salmonella could confer CIP MIC to 1 to 2µg/ml. The second type of conjugative plasmid belonged to ~240kb IncH1/IncF plasmids carrying a single PMQR gene, qnrS Importantly, this type of conjugative ciprofloxacin resistance plasmids could be detected in clinical isolates of Salmonella Dissemination of these conjugative plasmids that confer ciprofloxaicn resistance poses serious public health impact and Salmonella infection control. Copyright © 2018 American Society for Microbiology.


September 22, 2019

The plant growth-promoting rhizobacterium Variovorax boronicumulans CGMCC 4969 regulates the level of indole-3-acetic acid synthesized from indole-3-acetonitrile.

Variovorax is a metabolically diverse genus of plant growth-promoting rhizobacteria (PGPR) that engages in mutually beneficial interactions between plants and microbes. Unlike most PGPR, Variovorax cannot synthesize the phytohormone indole-3-acetic acid (IAA) via tryptophan. However, we found that V. boronicumulans strain CGMCC 4969 could produce IAA using indole-3-acetonitrile (IAN) as the precursor. Thus, in the present study, the IAA synthesis mechanism of V. boronicumulans CGMCC 4969 was investigated. V. boronicumulans CGMCC 4969 metabolized IAN to IAA through both a nitrilase-dependent pathway and a nitrile hydratase (NHase) and amidase-dependent pathway. Cobalt enhanced the metabolic flux via the NHase/amidase, by which IAN was rapidly converted to indole-3-acetamide (IAM) and in turn to IAA. IAN stimulated the metabolic flux via the nitrilase, by which IAN was rapidly converted to IAA. Subsequently, the IAA was degraded. V. boronicumulans CGMCC 4969 could use IAN as the sole carbon and nitrogen source for growth. Genome sequencing confirmed the IAA synthesis pathways. Gene cloning and overexpression in Escherichia coli indicated that NitA has the nitrilase activity, and IamA has the amidase activity to respectively transform IAN and IAM to IAA. Interestingly, NitA showed a close genetic relationship with the nitrilase of the phytopathogen Pseudomonas syringae Quantitative PCR analysis indicated that the NHase/amidase system is constitutively expressed, whereas the nitrilase is inducible. The present study helps our understanding of the versatile functions of Variovorax nitrile-converting enzymes that mediate IAA synthesis and the interactions between plants and these bacteria.IMPORTANCE We demonstrated that Variovorax boronicumulans CGMCC 4969 has two enzymatic systems-nitrilase and nitrile hydratase/amidase-that convert indole-3-acetonitrile (IAN) to the important plant hormone indole-3-acetic acid (IAA). The two IAA synthesis systems have very different regulatory mechanisms, affecting the IAA synthesis rate and duration. The nitrilase was induced by IAN, which was rapidly converted to IAA; subsequently IAA was rapidly consumed for cell growth. The NHase and amidase system was constitutively expressed and slowly but continuously synthesized IAA. In addition to synthesizing IAA from IAN, CGMCC 4969 has a rapid IAA degradation system, which would be helpful for a host plant to eliminate redundant IAA. This study indicates that the plant growth-promoting rhizobacterium V. boronicumulans CGMCC 4969 has the potential to be used by host plants to regulate the IAA level. Copyright © 2018 American Society for Microbiology.


September 22, 2019

Emergence of an XDR and carbapenemase-producing hypervirulent Klebsiella pneumoniae strain in Taiwan.

Carbapenemase-producing Klebsiella pneumoniae causes high mortality owing to the limited therapeutic options available. Here, we investigated an emergent carbapenem-resistant K. pneumoniae strain with hypervirulence found among KPC-2-producing strains in Taiwan.KPC-producing K. pneumoniae strains were collected consecutively from clinical specimens at the Taipei Veterans General Hospital between January 2012 and December 2014. Capsular types and the presence of rmpA/rmpA2 were analysed, and PFGE and MLST performed using these strains. The strain positive for rmpA/rmpA2 was tested in an in vivo mouse lethality study to verify its virulence and subjected to WGS to delineate its genomic features.A total of 62 KPC-2-producing K. pneumoniae strains were identified; all of these belonged to ST11 and capsular genotype K47. One strain isolated from a fatal case with intra-abdominal abscess (TVGHCRE225) harboured rmpA and rmpA2 genes. This strain was resistant to tigecycline and colistin, in addition to carbapenems, and did not belong to the major cluster in PFGE. TVGHCRE225 exhibited high in vivo virulence in the mouse lethality experiment. WGS showed that TVGHCRE225 acquired a novel hybrid virulence plasmid harbouring a set of virulence genes (iroBCDN, iucABCD, rmpA and rmpA2, and iutA) compared with the classic ST11 KPC-2-producing strain.We identified an XDR ST11 KPC-2-producing K. pneumoniae strain carrying a hybrid virulent plasmid in Taiwan. Active surveillance focusing on carbapenem-resistant hypervirulent K. pneumoniae strains is necessary, as the threat to human health is imminent.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.