Menu
September 22, 2019

PBHoover and CigarRoller: a method for confident haploid variant calling on Pacific Biosciences data and its application to heterogeneous population analysis

Motivation: Single Molecule Real-Time (SMRT) sequencing has important and underutilized advantages that amplification-based platforms lack. Lack of systematic error (e.g. GC-bias), complete de novo assembly (including large repetitive regions) without scaffolding, can be mentioned. SMRT sequencing, however suffers from high random error rate and low sequencing depth (older chemistries). Here, we introduce PBHoover, software that uses a heuristic calling algorithm in order to make base calls with high certainty in low coverage regions. This software is also capable of mixed population detection with high sensitivity. PBHoovertextquoterights CigarRoller attachment improves sequencing depth in low-coverage regions through CIGAR-string correction. Results: We tested both modules on 348 M.tuberculosis clinical isolates sequenced on C1 or C2 chemistries. On average, CigarRoller improved percentage of usable read count from 68.9% to 99.98% in C1 runs and from 50% to 99% in C2 runs. Using the greater depth provided by CigarRoller, PBHoover was able to make base and variant calls 99.95% concordant with Sanger calls (QV33). PBHoover also detected antibiotic-resistant subpopulations that went undetected by Sanger. Using C1 chemistry, subpopulations as small as 9% of the total colony can be detected by PBHoover. This provides the most sensitive amplification-free molecular method for heterogeneity analysis and is in line with phenotypic methodstextquoteright sensitivity. This sensitivity significantly improves with the greater depth and lower error rate of the newer chemistries. Availability and Implementation: Executables are freely available under GNU GPL v3+ at http://www.gitlab.com/LPCDRP/pbhoover and http://www.gitlab.com/LPCDRP/CigarRoller. PBHoover is also available on bioconda: https://anaconda.org/bioconda/pbhoover.


September 22, 2019

Potential survival and pathogenesis of a novel strain, Vibrio parahaemolyticus FORC_022, isolated from a soy sauce marinated crab by genome and transcriptome analyses.

Vibrio parahaemolyticus can cause gastrointestinal illness through consumption of seafood. Despite frequent food-borne outbreaks of V. parahaemolyticus, only 19 strains have subjected to complete whole-genome analysis. In this study, a novel strain of V. parahaemolyticus, designated FORC_022 (Food-borne pathogen Omics Research Center_022), was isolated from soy sauce marinated crabs, and its genome and transcriptome were analyzed to elucidate the pathogenic mechanisms. FORC_022 did not include major virulence factors of thermostable direct hemolysin (tdh) and TDH-related hemolysin (trh). However, FORC_022 showed high cytotoxicity and had several V. parahaemolyticus islands (VPaIs) and other virulence factors, such as various secretion systems (types I, II, III, IV, and VI), in comparative genome analysis with CDC_K4557 (the most similar strain) and RIMD2210633 (genome island marker strain). FORC_022 harbored additional virulence genes, including accessory cholera enterotoxin, zona occludens toxin, and tight adhesion (tad) locus, compared with CDC_K4557. In addition, O3 serotype specific gene and the marker gene of pandemic O3:K6 serotype (toxRS) were detected in FORC_022. The expressions levels of genes involved in adherence and carbohydrate transporter were high, whereas those of genes involved in motility, arginine biosynthesis, and proline metabolism were low after exposure to crabs. Moreover, the virulence factors of the type III secretion system, tad locus, and thermolabile hemolysin were overexpressed. Therefore, the risk of foodborne-illness may be high following consumption of FORC_022 contaminated crab. These results provided molecular information regarding the survival and pathogenesis of V. parahaemolyticus FORC_022 strain in contaminated crab and may have applications in food safety.


September 22, 2019

Sequencing of Panax notoginseng genome reveals genes involved in disease resistance and ginsenoside biosynthesis

Background: Panax notoginseng is a traditional Chinese herb with high medicinal and economic value. There has been considerable research on the pharmacological activities of ginsenosides contained in Panax spp.; however, very little is known about the ginsenoside biosynthetic pathway. Results: We reported the first de novo genome of 2.36 Gb of sequences from P. notoginseng with 35,451 protein-encoding genes. Compared to other plants, we found notable gene family contraction of disease-resistance genes in P. notoginseng, but notable expansion for several ATP-binding cassette (ABC) transporter subfamilies, such as the Gpdr subfamily, indicating that ABCs might be an additional mechanism for the plant to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, we identified several key genes, their transcription factor binding sites and all their family members involved in the synthesis pathway of ginsenosides in P. notoginseng, including dammarenediol synthase, CYP716 and UGT71. Conclusions: The complete genome analysis of P. notoginseng, the first in genus Panax, will serve as an important reference sequence for improving breeding and cultivation of this important nutraceutical and medicinal but vulnerable plant species.


September 22, 2019

Comparative genomics of Escherichia coli sequence type 219 clones from the same patient: Evolution of the IncI1 blaCMY-carrying plasmid in vivo.

This study investigates the evolution of an Escherichia coli sequence type 219 clone in a patient with recurrent urinary tract infection, comparing isolate EC974 obtained prior to antibiotic treatment and isolate EC1515 recovered after exposure to several ß-lactam antibiotics (ceftriaxone, cefixime, and imipenem). EC974 had a smooth colony morphology, while EC1515 had a rough colony morphology on sheep blood agar. RAPD-PCR analysis suggested that both isolates belonged to the same clone. Antimicrobial susceptibility tests showed that EC1515 was more resistant to piperacillin/tazobactam, cefepime, cefpirome, and ertapenem than EC974. Comparative genomic analysis was used to investigate the genetic changes of EC974 and EC1515 within the host, and showed three plasmids with replicons IncI1, P0111, and IncFII in both isolates. P0111-type plasmids pEC974-2 and pEC1515-2, contained the antibiotic resistance genes aadA2, tetA, and drfA12. IncFII-type plasmids pEC974-3 and pEC1515-3 contained the antibiotic resistance genes blaTEM-1, aadA1, aadA22, sul3, and inuF. Interestingly, blaCMY-111 and blaCMY-4 were found in very similar IncI1 plasmids that also contained aadA22 and aac(3)-IId, from isolates EC974 (pEC974-1) and EC1515 (pEC1515-1), respectively. The results showed in vivo amino acid substitutions converting blaCMY-111 to blaCMY-4 (R221W and A238V substitutions). Conjugation experiments showed a high frequency of IncI1 and IncFII plasmid co-transference. Transconjugants and DH5a cells harboring blaCMY-4 or blaCMY-111 showed higher levels of resistance to ampicillin, amoxicillin, cefazolin, cefuroxime, cefotaxime, cefixime, and ceftazidime, but not piperacillin/tazobactam, cefpime, or ertapenem. All known genes (outer membrane proteins and extended-spectrum AmpC ß-lactamases) involved in ETP resistance in E. coli were identical between EC974 and EC1515. This is the first study to identify the evolution of an IncI1 plasmid within the host, and to characterize blaCMY-111 in E. coli.


September 22, 2019

The integrative conjugative element clc (ICEclc) of Pseudomonas aeruginosa JB2.

Integrative conjugative elements (ICE) are a diverse group of chromosomally integrated, self-transmissible mobile genetic elements (MGE) that are active in shaping the functions of bacteria and bacterial communities. Each type of ICE carries a characteristic set of core genes encoding functions essential for maintenance and self-transmission, and cargo genes that endow on hosts phenotypes beneficial for niche adaptation. An important area to which ICE can contribute beneficial functions is the biodegradation of xenobiotic compounds. In the biodegradation realm, the best-characterized ICE is ICEclc, which carries cargo genes encoding for ortho-cleavage of chlorocatechols (clc genes) and aminophenol metabolism (amn genes). The element was originally identified in the 3-chlorobenzoate-degrader Pseudomonas knackmussii B13, and the closest relative is a nearly identical element in Burkholderia xenovorans LB400 (designated ICEclc-B13 and ICEclc-LB400, respectively). In the present report, genome sequencing of the o-chlorobenzoate degrader Pseudomonas aeruginosa JB2 was used to identify a new member of the ICEclc family, ICEclc-JB2. The cargo of ICEclc-JB2 differs from that of ICEclc-B13 and ICEclc-LB400 in consisting of a unique combination of genes that encode for the utilization of o-halobenzoates and o-hydroxybenzoate as growth substrates (ohb genes and hyb genes, respectively) and which are duplicated in a tandem repeat. Also, ICEclc-JB2 lacks an operon of regulatory genes (tciR-marR-mfsR) that is present in the other two ICEclc, and which controls excision from the host. Thus, the mechanisms regulating intracellular behavior of ICEclc-JB2 may differ from that of its close relatives. The entire tandem repeat in ICEclc-JB2 can excise independently from the element in a process apparently involving transposases/insertion sequence associated with the repeats. Excision of the repeats removes important niche adaptation genes from ICEclc-JB2, rendering it less beneficial to the host. However, the reduced version of ICEclc-JB2 could now acquire new genes that might be beneficial to a future host and, consequently, to the survival of ICEclc-JB2. Collectively, the present identification and characterization of ICEclc-JB2 provides insights into roles of MGE in bacterial niche adaptation and the evolution of catabolic pathways for biodegradation of xenobiotic compounds.


September 22, 2019

Comparative genomics and genotype-phenotype associations in Bifidobacterium breve.

Bifidobacteria are common members of the gastro-intestinal microbiota of a broad range of animal hosts. Their successful adaptation to this particular niche is linked to their saccharolytic metabolism, which is supported by a wide range of glycosyl hydrolases. In the current study a large-scale gene-trait matching (GTM) effort was performed to explore glycan degradation capabilities in B. breve. By correlating the presence/absence of genes and associated genomic clusters with growth/no-growth patterns across a dataset of 20 Bifidobacterium breve strains and nearly 80 different potential growth substrates, we not only validated the approach for a number of previously characterized carbohydrate utilization clusters, but we were also able to discover novel genetic clusters linked to the metabolism of salicin and sucrose. Using GTM, genetic associations were also established for antibiotic resistance and exopolysaccharide production, thereby identifying (novel) bifidobacterial antibiotic resistance markers and showing that the GTM approach is applicable to a variety of phenotypes. Overall, the GTM findings clearly expand our knowledge on members of the B. breve species, in particular how their variable genetic features can be linked to specific phenotypes.


September 22, 2019

Quorum-quenching bacteria isolated from Red Sea sediments reduce biofilm formation by Pseudomonas aeruginosa.

Quorum sensing (QS) is the process by which bacteria communicate with each other through small signaling molecules such as N-acylhomoserine lactones (AHLs). Certain bacteria can degrade AHL molecules by a process called quorum quenching (QQ); therefore, QQ can be used to control bacterial infections and biofilm formation. In this study, we aimed to identify new species of bacteria with QQ activity. Red Sea sediments were collected either from the close vicinity of seagrass or from areas with no vegetation. We isolated 72 bacterial strains, which were tested for their ability to degrade/inactivate AHL molecules. Chromobacterium violaceum CV026-based bioassay was used for the initial screening of isolates with QQ activity. QQ activity was further quantified using high-performance liquid chromatography-tandem mass spectrometry. We found that these isolates could degrade AHL molecules of different acyl chain lengths as well as modifications. 16S-rRNA sequencing of positive QQ isolates showed that they belonged to three different genera. Specifically, two isolates belonged to the genus Erythrobacter; four, Labrenzia; and one, Bacterioplanes. The genome of one representative isolate from each genus was sequenced, and potential QQ enzymes, namely, lactonases and acylases, were identified. The ability of these isolates to degrade the 3OXOC12-AHLs produced by Pseudomonas aeruginosa PAO1 and hence inhibit biofilm formation was investigated. Our results showed that the isolate VG12 (genus Labrenzia) is better than other isolates at controlling biofilm formation by PAO1 and degradation of different AHL molecules. Time-course experiments to study AHL degradation showed that VG1 (genus Erythrobacter) could degrade AHLs faster than other isolates. Thus, QQ bacteria or enzymes can be used in combination with an antibacterial to overcome antibiotic resistance.


September 22, 2019

Complete genome sequence of Enterococcus durans KLDS6.0933, a potential probiotic strain with high cholesterol removal ability

Enterococci are commensal bacteria in the mammalian gastrointestinal tract which play an important role in the production of various fermented foods. Thus, certain enterococcal strains are commonly used as probiotics to confer health benefits to human and animals. Enterococcus durans KLDS6.0933 is a potential probiotic strain with high cholesterol removal ability, which was isolated from traditional naturally fermented cream in Inner Mongolia of China. To better understand the genetic basis of the probiotic properties of this strain, the whole-genome sequence was performed using the PacBio RSII platform.


September 22, 2019

The draft genomes of Elizabethkingia anophelis of equine origin are genetically similar to three isolates from human clinical specimens.

We report the isolation and characterization of two Elizabethkingia anophelis strains (OSUVM-1 and OSUVM-2) isolated from sources associated with horses in Oklahoma. Both strains appeared susceptible to fluoroquinolones and demonstrated high MICs to all cell wall active antimicrobials including vancomycin, along with aminoglycosides, fusidic acid, chloramphenicol, and tetracycline. Typical of the Elizabethkingia, both draft genomes contained multiple copies of ß-lactamase genes as well as genes predicted to function in antimicrobial efflux. Phylogenetic analysis of the draft genomes revealed that OSUVM-1 and OSUVM-2 differ by only 6 SNPs and are in a clade with 3 strains of Elizabethkingia anophelis that were responsible for human infections. These findings therefore raise the possibility that Elizabethkingia might have the potential to move between humans and animals in a manner similar to known zoonotic pathogens.


September 22, 2019

Fusarium species complex causing Pokkah Boeng in China

Sugarcane is one of the most important crops for sugar production in sugarcane-growing areas. Many biotic and abiotic stresses affected the sugarcane production which leads to severe losses. Pokkah boeng is now playing a very important role due to its economic threats. Currently, the occurrence and rigorousness of pokkah boeng disease have been spread like wildfire from major sugarcane-growing countries. Pokkah boeng is a fungal disease that can cause serious yield losses in susceptible varieties. Infection of the disease is caused either by spores or ascospores. It may cause serious yield losses in commercial plantings. However, there have been many reported outbreaks of the disease which have looked spectacular but have caused trade and industry loss. Fusarium species complex is the major causal agent of this disease around the world, but some researchers have documented the increased importance of Fusarium. Three Fusarium species have been identified to cause the sugarcane pokkah boeng disease in China. Moreover, Fusarium may be accompanied of its mycotoxin production, genomic sequencing, and association with nitrogen application in China. Many studies on disease investigations, breeding of disease-resistant varieties, and strategy of disease control have also been carried out in China.


September 22, 2019

A chromosome scale assembly of the model desiccation tolerant grass Oropetium thomaeum

Oropetium thomaeum is an emerging model for desiccation tolerance and genome size evolution in grasses. A high-quality draft genome of Oropetium was recently sequenced, but the lack of a chromosome scale assembly has hindered comparative analyses and downstream functional genomics. Here, we reassembled Oropetium, and anchored the genome into ten chromosomes using Hi-C based chromatin interactions. A combination of high-resolution RNAseq data and homology-based gene prediction identified thousands of new, conserved gene models that were absent from the V1 assembly. This includes thousands of new genes with high expression across a desiccation timecourse. The sorghum and Oropetium genomes have a surprising degree of chromosome-level collinearity, and several chromosome pairs have near perfect synteny. Other chromosomes are collinear in the gene rich chromosome arms but have experienced pericentric translocations. Together, these resources will be useful for the grass comparative genomic community and further establish Oropetium as a model resurrection plant.


September 22, 2019

Integrating long-range connectivity information into de Bruijn graphs.

The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data.We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes.Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex.Supplementary data are available at Bioinformatics online.


September 22, 2019

Periodic variation of mutation rates in bacterial genomes associated with replication timing

The causes and consequences of spatiotemporal variation in mutation rates remain to be explored in nearly all organisms. Here we examine relationships between local mutation rates and replication timing in three bacterial species whose genomes have multiple chromosomes: Vibrio fischeri, Vibrio cholerae, and Burkholderia cenocepacia Following five mutation accumulation experiments with these bacteria conducted in the near absence of natural selection, the genomes of clones from each lineage were sequenced and analyzed to identify variation in mutation rates and spectra. In lineages lacking mismatch repair, base substitution mutation rates vary in a mirrored wave-like pattern on opposing replichores of the large chromosomes of V. fischeri and V. cholerae, where concurrently replicated regions experience similar base substitution mutation rates. The base substitution mutation rates on the small chromosome are less variable in both species but occur at similar rates to those in the concurrently replicated regions of the large chromosome. Neither nucleotide composition nor frequency of nucleotide motifs differed among regions experiencing high and low base substitution rates, which along with the inferred ~800-kb wave period suggests that the source of the periodicity is not sequence specific but rather a systematic process related to the cell cycle. These results support the notion that base substitution mutation rates are likely to vary systematically across many bacterial genomes, which exposes certain genes to elevated deleterious mutational load.IMPORTANCE That mutation rates vary within bacterial genomes is well known, but the detailed study of these biases has been made possible only recently with contemporary sequencing methods. We applied these methods to understand how bacterial genomes with multiple chromosomes, like those of Vibrio and Burkholderia, might experience heterogeneous mutation rates because of their unusual replication and the greater genetic diversity found on smaller chromosomes. This study captured thousands of mutations and revealed wave-like rate variation that is synchronized with replication timing and not explained by sequence context. The scale of this rate variation over hundreds of kilobases of DNA strongly suggests that a temporally regulated cellular process may generate wave-like variation in mutation risk. These findings add to our understanding of how mutation risk is distributed across bacterial and likely also eukaryotic genomes, owing to their highly conserved replication and repair machinery. Copyright © 2018 Dillon et al.


September 22, 2019

Plasmodium vivax-like genome sequences shed new insights into Plasmodium vivax biology and evolution.

Although Plasmodium vivax is responsible for the majority of malaria infections outside Africa, little is known about its evolution and pathway to humans. Its closest genetic relative, P. vivax-like, was discovered in African great apes and is hypothesized to have given rise to P. vivax in humans. To unravel the evolutionary history and adaptation of P. vivax to different host environments, we generated using long- and short-read sequence technologies 2 new P. vivax-like reference genomes and 9 additional P. vivax-like genotypes. Analyses show that the genomes of P. vivax and P. vivax-like are highly similar and colinear within the core regions. Phylogenetic analyses clearly show that P. vivax-like parasites form a genetically distinct clade from P. vivax. Concerning the relative divergence dating, we show that the evolution of P. vivax in humans did not occur at the same time as the other agents of human malaria, thus suggesting that the transfer of Plasmodium parasites to humans happened several times independently over the history of the Homo genus. We further identify several key genes that exhibit signatures of positive selection exclusively in the human P. vivax parasites. Two of these genes have been identified to also be under positive selection in the other main human malaria agent, P. falciparum, thus suggesting their key role in the evolution of the ability of these parasites to infect humans or their anthropophilic vectors. Finally, we demonstrate that some gene families important for red blood cell (RBC) invasion (a key step of the life cycle of these parasites) have undergone lineage-specific evolution in the human parasite (e.g., reticulocyte-binding proteins [RBPs]).


September 22, 2019

Identification of the DNA methyltransferases establishing the methylome of the cyanobacterium Synechocystis sp. PCC 6803.

DNA methylation in bacteria is important for defense against foreign DNA, but is also involved in DNA repair, replication, chromosome partitioning, and regulatory processes. Thus, characterization of the underlying DNA methyltransferases in genetically tractable bacteria is of paramount importance. Here, we characterized the methylome and orphan methyltransferases in the model cyanobacterium Synechocystis sp. PCC 6803. Single molecule real-time (SMRT) sequencing revealed four DNA methylation recognition sequences in addition to the previously known motif m5CGATCG, which is recognized by M.Ssp6803I. For three of the new recognition sequences, we identified the responsible methyltransferases. M.Ssp6803II, encoded by the sll0729 gene, modifies GGm4CC, M.Ssp6803III, encoded by slr1803, represents the cyanobacterial dam-like methyltransferase modifying Gm6ATC, and M.Ssp6803V, encoded by slr6095 on plasmid pSYSX, transfers methyl groups to the bipartite motif GGm6AN7TTGG/CCAm6AN7TCC. The remaining methylation recognition sequence GAm6AGGC is probably recognized by methyltransferase M.Ssp6803IV encoded by slr6050. M.Ssp6803III and M.Ssp6803IV were essential for the viability of Synechocystis, while the strains lacking M.Ssp6803I and M.Ssp6803V showed growth similar to the wild type. In contrast, growth was strongly diminished of the ?sll0729 mutant lacking M.Ssp6803II. These data provide the basis for systematic studies on the molecular mechanisms impacted by these methyltransferases.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.