Menu
September 22, 2019  |  

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

PacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of subsequent analysis tools. Although there are several available simulators (e.g., PBSIM, SimLoRD and FASTQSim) that target the specific generation of PacBio libraries, the error rate of simulated sequences is not well matched to the quality value of raw PacBio datasets, especially for PacBio’s continuous long reads (CLR).By analyzing the characteristic features of CLR data from PacBio SMRT (single molecule real time) sequencing, we developed a new PacBio sequencing simulator (called NPBSS) for producing CLR reads. NPBSS simulator firstly samples the read sequences according to the read length logarithmic normal distribution, and choses different base quality values with different proportions. Then, NPBSS computes the overall error probability of each base in the read sequence with an empirical model, and calculates the deletion, substitution and insertion probabilities with the overall error probability to generate the PacBio CLR reads. Alignment results demonstrate that NPBSS fits the error rate of the PacBio CLR reads better than PBSIM and FASTQSim. In addition, the assembly results also show that simulated sequences of NPBSS are more like real PacBio CLR data.NPBSS simulator is convenient to use with efficient computation and flexible parameters setting. Its generating PacBio CLR reads are more like real PacBio datasets.


September 22, 2019  |  

In vivo evolution of drug-resistant Mycobacterium tuberculosis in patients during long-term treatment.

In the current scenario, the drug-resistant tuberculosis is a significant challenge in the control of tuberculosis worldwide. In order to investigate the in vivo evolution of drug-resistant M. tuberculosis, the present study envisaged sequencing of the draft genomes of 18 serial isolates from four pre-extensively drug-resistant (pre-XDR) tuberculosis patients for continuous genetic alterations.All of the isolates harbored single nucleotide polymorphisms (SNPs) ranging from 1303 to 1309 with M. tuberculosis H37Rv as the reference. SNPs ranged from 0 to 12 within patients. The evolution rates were higher than the reported SNPs of 0.5 in the four patients. All the isolates exhibited mutations at sites of known drug targets, while some contained mutations in uncertain drug targets including folC, proZ, and pyrG. The compensatory substitutions for rescuing these deleterious mutations during evolution were only found in RpoC I491T in one patient. Many loci with microheterogeneity showed transient mutations in different isolates. Ninety three SNPs exhibited significant association with refractory pre-XDR TB isolates.Our results showed evolutionary changes in the serial genetic characteristics of the pre-XDR TB patients due to accumulation of the fixed drug-resistant related mutations, and the transient mutations under continuous antibiotics pressure over several years.


September 22, 2019  |  

Involvement of Burkholderiaceae and sulfurous volatiles in disease-suppressive soils.

Disease-suppressive soils are ecosystems in which plants suffer less from root infections due to the activities of specific microbial consortia. The characteristics of soils suppressive to specific fungal root pathogens are comparable to those of adaptive immunity in animals, as reported by Raaijmakers and Mazzola (Science 352:1392-3, 2016), but the mechanisms and microbial species involved in the soil suppressiveness are largely unknown. Previous taxonomic and metatranscriptome analyses of a soil suppressive to the fungal root pathogen Rhizoctonia solani revealed that members of the Burkholderiaceae family were more abundant and more active in suppressive than in non-suppressive soils. Here, isolation, phylogeny, and soil bioassays revealed a significant disease-suppressive activity for representative isolates of Burkholderia pyrrocinia, Paraburkholderia caledonica, P. graminis, P. hospita, and P. terricola. In vitro antifungal activity was only observed for P. graminis. Comparative genomics and metabolite profiling further showed that the antifungal activity of P. graminis PHS1 was associated with the production of sulfurous volatile compounds encoded by genes not found in the other four genera. Site-directed mutagenesis of two of these genes, encoding a dimethyl sulfoxide reductase and a cysteine desulfurase, resulted in a loss of antifungal activity both in vitro and in situ. These results indicate that specific members of the Burkholderiaceae family contribute to soil suppressiveness via the production of sulfurous volatile compounds.


September 22, 2019  |  

Spread of carbapenem resistance by transposition and conjugation among Pseudomonas aeruginosa.

The emergence of carbapenem-resistant Pseudomonas aeruginosa represents a worldwide problem. To understand the carbapenem-resistance mechanisms and their spreading among P. aeruginosa strains, whole genome sequences were determined of two extensively drug-resistant strains that are endemic in Dutch hospitals. Strain Carb01 63 is of O-antigen serotype O12 and of sequence type ST111, whilst S04 90 is a serotype O11 strain of ST446. Both strains carry a gene for metallo-ß-lactamase VIM-2 flanked by two aacA29 genes encoding aminoglycoside acetyltransferases on a class 1 integron. The integron is located on the chromosome in strain Carb01 63 and on a plasmid in strain S04 90. The backbone of the 159-kb plasmid, designated pS04 90, is similar to a previously described plasmid, pND6-2, from Pseudomonas putida. Analysis of the context of the integron showed that it is present in both strains on a ~30-kb mosaic DNA segment composed of four different transposons that can presumably act together as a novel, active, composite transposon. Apart from the presence of a 1237-bp insertion sequence element in the composite transposon on pS04 90, these transposons show > 99% sequence identity indicating that transposition between plasmid and chromosome could have occurred only very recently. The pS04 90 plasmid could be transferred by conjugation to a susceptible P. aeruginosa strain. A second class 1 integron containing a gene for a CARB-2 ß-lactamase flanked by an aacA4′-8 and an aadA2 gene, encoding an aminoglycoside acetyltransferase and adenylyltransferase, respectively, was present only in strain Carb01 63. This integron is located also on a composite transposon that is inserted in an integrative and conjugative element on the chromosome. Additionally, this strain contains a frameshift mutation in the oprD gene encoding a porin involved in the transport of carbapenems across the outer membrane. Together, the results demonstrate that integron-encoded carbapenem and carbapenicillin resistance can easily be disseminated by transposition and conjugation among Pseudomonas aeruginosa strains.


September 22, 2019  |  

Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (single instruction multiple data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we (a) distribute many independent alignments on multiple threads and (b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal.We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon PhiTM (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon PhiTM and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module.The module is programmed in C++?using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4 under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME: SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++?compilers on various platforms.Supplementary data are available at Bioinformatics online.


September 22, 2019  |  

Characterization of the antimonite- and arsenite-oxidizing bacterium Bosea sp. AS-1 and its potential application in arsenic removal.

Arsenic (As) and antinomy (Sb) usually coexist in natural environments where both of them pollute soils and water. Microorganisms that oxidize arsenite [As(III)] and tolerate Sb have great potential in As and Sb bioremediation, In this study, a Gram-negative bacterial strain, Bosea sp. AS-1, was isolated from a mine slag sample collected in Xikuangshan Sb mine in China. AS-1 could tolerate 120?mM of As(III) and 50?mM of antimonite [Sb(III)]. It could also oxidize 2?mM of As(III) or Sb(III) completely under heterotrophic and aerobic conditions. Interestingly, strain AS-1 preferred to oxidize As(III) with yeast extract as the carbon source, whereas Sb(III) oxidation was favored with lactate in the medium. Genomic analysis of AS-1 confirmed the presence of several gene islands for As resistance and oxidation. Notably, a system of AS-1 and goethite was found to be able to remove 99% of the As with the initial concentration of 500?µg/L As(III) and 500?µg/L Sb(III), which suggests the potential of this approach for As removal in environments especially with the presence of high Sb. Copyright © 2018 Elsevier B.V. All rights reserved.


September 22, 2019  |  

Prevalence, antimicrobial resistance and phylogenetic characterization of Yersinia enterocolitica in retail poultry meat and swine feces in parts of China

Yersinia enterocolitica is an enteropathogen transmitted by contaminated food. In this study, a total of 500 retail poultry meat samples from 4 provinces and 145 swine feces samples from 12 provinces in China was tested for Y. enterocolitica and 26 isolates were obtained for further bio-serotyping, testing with antimicrobial susceptibility testing to a panel of antimicrobial compounds, and genetically characterization based on the whole genome sequencing. Higher prevalence (4.8%) of Y. enterocolitica contamination in retail poultry meat than that in swine feces (2.76%) was observed. No difference in bio-serotypes, multilocus sequence typing (MLST) and virulence genes distribution between swine and poultry origin were found. All isolates were resistant to ampicillin, amoxicillin/clavulanic acid, and cefazolin and were multi-drug resistant (MDR). The most predominant drug-resistance profile was AMP-CFZ-AMC-FOX (42.31%). A pathogenic isolate with bio-serotype 3/O:3 and ST135 was cultured from retail fresh chicken meat for the first time in China. Based on the whole-genome single nucleotide polymorphisms (SNPs) tree analysis, pathogenic isolates clustered closely, while nonpathogenic isolates exhibited high genetic heterogeneity. These indicated that pathogenic isolates were conserved on genetic level. The whole-genome SNP tree also revealed that Y. enterocolitica of swine, chicken and duck origin may share a common ancestor. The findings highlight the emergence of drug-resistant pathogenic Y. entrocoliticas in retailed poultry meats in China.


September 22, 2019  |  

Construction of stable fluorescent laboratory control strains for several food safety relevant Enterobacteriaceae.

Using naturally-occurring bacterial strains as positive controls in testing protocols is typically feared due to the risk of cross-contaminating samples. We have developed a collection of strains which express Green Fluorescent Protein (GFP) at high-level, permitting rapid screening of the following species on selective or non-selective plates: Escherichia coli O157:H7, Shigella sonnei, S. flexneri, Salmonella enterica subsp. Enterica serovar Gaminera, S. Mbandaka, S. Tennesse, S. Minnesota, S. Senftenberg and S. Typhimurium. These new strains fluoresce when irradiated with UV light and maintain this phenotype in absence of antibiotic selection. Recombinants were phenotypically equivalent to the parent strain, except for S. Tennessee Sal66 that appeared Lac- on Xylose Lysine Deoxycholate (XLD) agar plates and Lac+ on Mac Conkey and Hektoen Enteric agar plates. Analysis of closed whole genome sequences revealed that Sal66 had lost one lactose operon; slower rates of lactose metabolism may affect lactose fermentation on XLD agar. These fluorescent enteric control strains were challenging to develop and should provide an easy and effective means of identifying cross-contamination. Published by Elsevier Ltd.


September 22, 2019  |  

Development of New Tools to Detect Colistin-Resistance among Enterobacteriaceae Strains.

The recent discovery of the plasmid-mediated mcr-1 gene conferring resistance to colistin is of clinical concern. The worldwide screening of this resistance mechanism among samples of different origins has highlighted the urgent need to improve the detection of colistin-resistant isolates in clinical microbiology laboratories. Currently, phenotypic methods used to detect colistin resistance are not necessarily suitable as the main characteristic of the mcr genes is the low level of resistance that they confer, close to the clinical breakpoint recommended jointly by the CLSI and EUCAST expert systems (S?=?2?mg/L and R?>?2?mg/L). In this context, susceptibility testing recommendations for polymyxins have evolved and are becoming difficult to implement in routine laboratory work. The large number of mechanisms and genes involved in colistin resistance limits the access to rapid detection by molecular biology. It is therefore necessary to implement well-defined protocols using specific tools to detect all colistin-resistant bacteria. This review aims to summarize the current clinical microbiology diagnosis techniques and their ability to detect all colistin resistance mechanisms and describe new tools specifically developed to assess plasmid-mediated colistin resistance. Phenotyping, susceptibility testing, and genotyping methods are presented, including an update on recent studies related to the development of specific techniques.


September 21, 2019  |  

Toward complete bacterial genome sequencing through the combined use of multiple next-generation sequencing platforms.

PacBio’s long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of secondgeneration, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction.


September 21, 2019  |  

in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies.

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in non-model organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS. Copyright © 2016 Author et al.


September 21, 2019  |  

Recent advances in bioinformatics for fish genomics

In the past few years, we have contributed efforts to ~1/5 of the reported fish genomes. Based on our related experience, here we outline recent advances in bioinformatics for fish genomics, with an emphasis on development of software for genome assembly, genome annotation and evolutionary analysis. This review will be helpful for the new players of genome analysis on both animals and plants. In the past decade, whole genome sequences of approximately 50 fish species have been reported [1]. We have been involved in ~1/5 of these international works from 2014 to 2017, such as mudskippers (2014) [2], Chinese large yellow croaker [3], Chinese barbel fishes [4], Asian arowana [5,6], Channel catfish [7], seahorses [8], Japanese flounder [9], Chinese clearhead icefish [10] and Northern snakehead [11]. We are also in charge of the China Auqatic 10-100-1,000 Genomics Program [12], in which ~100 fish genomes are sequencing targets for the next 3~5 years. Based on our previous experience on fish genomic studies, here we outline recent advances in related bioinformatics for fish genomics to share with public readers. Since the basic informatics includes genome assembly, genome annotation and evolutionary analysis, we discuss them one by one in this order.


September 21, 2019  |  

Comparative genomics of enterohemorrhagic Escherichia coli O145:H28 demonstrates a common evolutionary lineage with Escherichia coli O157:H7.

Although serotype O157:H7 is the predominant enterohemorrhagic Escherichia coli (EHEC), outbreaks of non-O157 EHEC that cause severe foodborne illness, including hemolytic uremic syndrome have increased worldwide. In fact, non-O157 serotypes are now estimated to cause over half of all the Shiga toxin-producing Escherichia coli (STEC) cases, and outbreaks of non-O157 EHEC infections are frequently associated with serotypes O26, O45, O103, O111, O121, and O145. Currently, there are no complete genomes for O145 in public databases.We determined the complete genome sequences of two O145 strains (EcO145), one linked to a US lettuce-associated outbreak (RM13514) and one to a Belgium ice-cream-associated outbreak (RM13516). Both strains contain one chromosome and two large plasmids, with genome sizes of 5,737,294 bp for RM13514 and 5,559,008 bp for RM13516. Comparative analysis of the two EcO145 genomes revealed a large core (5,173 genes) and a considerable amount of strain-specific genes. Additionally, the two EcO145 genomes display distinct chromosomal architecture, virulence gene profile, phylogenetic origin of Stx2a prophage, and methylation profile (methylome). Comparative analysis of EcO145 genomes to other completely sequenced STEC and other E. coli and Shigella genomes revealed that, unlike any other known non-O157 EHEC strain, EcO145 ascended from a common lineage with EcO157/EcO55. This evolutionary relationship was further supported by the pangenome analysis of the 10 EHEC str ains. Of the 4,192 EHEC core genes, EcO145 shares more genes with EcO157 than with the any other non-O157 EHEC strains.Our data provide evidence that EcO145 and EcO157 evolved from a common lineage, but ultimately each serotype evolves via a lineage-independent nature to EHEC by acquisition of the core set of EHEC virulence factors, including the genes encoding Shiga toxin and the large virulence plasmid. The large variation between the two EcO145 genomes suggests a distinctive evolutionary path between the two outbreak strains. The distinct methylome between the two EcO145 strains is likely due to the presence of a BsuBI/PstI methyltransferase gene cassette in the Stx2a prophage of the strain RM13514, suggesting a role of horizontal gene transfer-mediated epigenetic alteration in the evolution of individual EHEC strains.


September 21, 2019  |  

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.