Menu
July 7, 2019  |  

Recent “omics” advances in Helicobacter pylori.

The development of high-throughput whole genome sequencing (WGS) technologies is changing the face of microbiology, facilitating the comparison of large numbers of genomes from different lineages of a same organism. Our aim was to review the main advances on Helicobacter pylori “omics” and to understand how this is improving our knowledge of the biology, diversity and pathogenesis of H. pylori. Since the first H. pylori isolate was sequenced in 1997, 510 genomes have been deposited in the NCBI archive, providing a basis for improved understanding of the epidemiology and evolution of this important pathogen. This review focuses on works published between April 2015 and March 2016. Helicobacter “omics” is already making an impact and is a growing research field. Ultimately these advances will be translated into a routine clinical laboratory setting in order to improve public health.© 2016 John Wiley & Sons Ltd.


July 7, 2019  |  

Persistence of a dominant bovine lineage of group B Streptococcus reveals genomic signatures of host adaptation.

Group B Streptococcus (GBS) is a host-generalist species, most notably causing disease in humans and cattle. However, the differential adaptation of GBS to its two main hosts, and the risk of animal to human infection remain poorly understood. Despite improvements in control measures across Europe, GBS is still one of the main causative agents of bovine mastitis in Portugal. Here, by whole-genome analysis of 150 bovine GBS isolates we discovered that a single CC61 clone is spreading throughout Portuguese herds since at least the early 1990s, having virtually replaced the previous GBS population. Mutations within an iron/manganese transporter were independently acquired by all of the CC61 isolates, underlining a key adaptive strategy to persist in the bovine host. Lateral transfer of bacteriocin production and antibiotic resistance genes also underscored the contribution of the microbial ecology and genetic pool within the bovine udder environment to the success of this clone. Compared to strains of human origin, GBS evolves twice as fast in bovines and undergoes recurrent pseudogenizations of human-adapted traits. Our work provides new insights into the potentially irreversible adaptation of GBS to the bovine environment. © 2016 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.


July 7, 2019  |  

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity.


July 7, 2019  |  

Complete genome sequence of Edwardsiella piscicida isolate S11-285 recovered from channel catfish (Ictalurus punctatus) in Mississippi, USA.

Edwardsiella piscicida is a recently described Gram-negative facultative anaerobe and an important pathogen to many wild and cultured fish species worldwide. Here, we report the complete and annotated genome of E. piscicida isolate S11-285 recovered from channel catfish (Ictalurus punctatus), consisting of a chromosome of 3,923,603 bp and 1 plasmid. Copyright © 2016 Reichley et al.


July 7, 2019  |  

Complete genome of Vibrio parahaemolyticus FORC014 isolated from the toothfish.

Foodborne illness can occur due to various pathogenic bacteria such as Staphylococcus aureus, Escherichia coli and Vibrio parahaemolyticus, and can cause severe gastroenteritis symptoms. In this study, we completed the genome sequence of a foodborne pathogen V. parahaemolyticus FORC_014, which was isolated from suspected contaminated toothfish from South Korea. Additionally, we extended our knowledge of genomic characteristics of the FORC_014 strain through comparative analysis using the complete sequences of other V. parahaemolyticus strains whose complete genomes have previously been reported.The complete genome sequence of V. parahaemolyticus FORC_014 was generated using the PacBio RS platform with single molecule, real-time (SMRT) sequencing. The FORC_014 strain consists of two circular chromosomes (3,241,330 bp for chromosome 1 and 1,997,247 bp for chromosome 2), one plasmid (51,383 bp), and one putative phage sequence (96,896 bp). The genome contains a total of 4274 putative protein coding sequences, 126 tRNA genes and 34 rRNA genes. Furthermore, we found 33 type III secretion system 1 (T3SS1) related proteins and 15 type III secretion system 2 (T3SS2) related proteins on chromosome 1. This is the first reported result of Type III secretion system 2 located on chromosome 1 of V. parahaemolyticus without thermostable direct hemolysin (tdh) and thermostable direct hemolysin-related hemolysin (trh).Through investigation of the complete genome sequence of V. parahaemolyticus FORC_014, which differs from previously reported strains, we revealed two type III secretion systems (T3SS1, T3SS2) located on chromosome 1 which do not include tdh and trh genes. We also identified several virulence factors carried by our strain, including iron uptake system, hemolysin and secretion system. This result suggests that the FORC_014 strain may be one pathogen responsible for foodborne illness outbreak. Our results provide significant genomic clues which will assist in future understanding of virulence at the genomic level and help distinguish between clinical and non-clinical isolates.


July 7, 2019  |  

Complete, closed genome sequences of 10 Salmonella enterica subsp. enterica serovar Typhimurium strains isolated from human and bovine sources.

Salmonella enterica is a leading cause of enterocolitis for humans and animals. S. enterica subsp. enterica serovar Typhimurium infects a broad range of hosts. To facilitate genomic comparisons among isolates from different sources, we present the complete genome sequences of 10 S Typhimurium strains, 5 each isolated from human and bovine sources. Copyright © 2016 Nguyen et al.


July 7, 2019  |  

WhatsHap: fast and accurate read-based phasing

Read-based phasing allows to reconstruct the haplotype structure of a sample purely from sequencing reads. While phasing is a required step for answering questions about population genetics, compound heterozygosity, and to aid in clinical decision making, there has been a lack of an accurate, usable and standards-based software. WhatsHap is a production-ready tool for highly accurate read-based phasing. It was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing. WhatsHap works also well with second-generation data, is easy to use and will phase not only SNVs, but also indels and other variants. It is unique in its ability to combine read-based with genetic phasing, allowing to further improve accuracy if multiple related samples are provided.


July 7, 2019  |  

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019  |  

Jabba: hybrid error correction for long sequencing reads.

Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.


July 7, 2019  |  

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019  |  

Information-optimal genome assembly via sparse read-overlap graphs.

In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence?Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50.Available at github.com/samhykim/nsgcourtade@eecs.berkeley.edu or dntse@stanford.eduSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019  |  

CoLoRMap: Correcting Long Reads by Mapping short reads.

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormapehaghshe@sfu.ca or cedric.chauve@sfu.caSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.