Menu
July 7, 2019

WhatsHap: fast and accurate read-based phasing

Read-based phasing allows to reconstruct the haplotype structure of a sample purely from sequencing reads. While phasing is a required step for answering questions about population genetics, compound heterozygosity, and to aid in clinical decision making, there has been a lack of an accurate, usable and standards-based software. WhatsHap is a production-ready tool for highly accurate read-based phasing. It was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing. WhatsHap works also well with second-generation data, is easy to use and will phase not only SNVs, but also indels and other variants. It is unique in its ability to combine read-based with genetic phasing, allowing to further improve accuracy if multiple related samples are provided.


July 7, 2019

Listeria monocytogenes in stone fruits linked to a multistate outbreak: enumeration of cells and whole-genome sequencing.

In 2014, the identification of stone fruits contaminated with Listeria monocytogenes led to the subsequent identification of a multistate outbreak. Simultaneous detection and enumeration of L. monocytogenes were performed on 105 fruits, each weighing 127 to 145 g, collected from 7 contaminated lots. The results showed that 53.3% of the fruits yielded L. monocytogenes (lower limit of detection, 5 CFU/fruit), and the levels ranged from 5 to 2,850 CFU/fruit, with a geometric mean of 11.3 CFU/fruit (0.1 CFU/g of fruit). Two serotypes, IVb-v1 and 1/2b, were identified by a combination of PCR- and antiserum-based serotyping among isolates from fruits and their packing environment; certain fruits contained a mixture of both serotypes. Single nucleotide polymorphism (SNP)-based whole-genome sequencing (WGS) analysis clustered isolates from two case-patients with the serotype IVb-v1 isolates and distinguished outbreak-associated isolates from pulsed-field gel electrophoresis (PFGE)-matched, but epidemiologically unrelated, clinical isolates. The outbreak-associated isolates differed by up to 42 SNPs. All but one serotype 1/2b isolate formed another WGS cluster and differed by up to 17 SNPs. Fully closed genomes of isolates from the stone fruits were used as references to maximize the resolution and to increase our confidence in prophage analysis. Putative prophages were conserved among isolates of each WGS cluster. All serotype IVb-v1 isolates belonged to singleton sequence type 382 (ST382); all but one serotype 1/2b isolate belonged to clonal complex 5.WGS proved to be an excellent tool to assist in the epidemiologic investigation of listeriosis outbreaks. The comparison at the genome level contributed to our understanding of the genetic diversity and variations among isolates involved in an outbreak or isolates associated with food and environmental samples from one facility. Fully closed genomes increased our confidence in the identification and comparison of accessory genomes. The diversity among the outbreak-associated isolates and the inclusion of PFGE-matched, but epidemiologically unrelated, isolates demonstrate the high resolution of WGS. The prevalence and enumeration data could contribute to our further understanding of the risk associated with Listeria monocytogenes contamination, especially among high-risk populations. Copyright © 2016 Chen et al.


July 7, 2019

Genomewide Dam methylation in Escherichia coli during long-term stationary phase.

DNA methylation in prokaryotes is widespread. The most common modification of the genome is the methylation of adenine at the N-6 position. In Escherichia coli K-12 and many gammaproteobacteria, this modification is catalyzed by DNA adenine methyltransferase (Dam) at the GATC consensus sequence and is known to modulate cellular processes including transcriptional regulation of gene expression, initiation of chromosomal replication, and DNA mismatch repair. While studies thus far have focused on the motifs associated with methylated adenine (meA), the frequency of meA across the genome, and temporal dynamics during early periods of incubation, here we conduct the first study on the temporal dynamics of adenine methylation in E. coli by Dam throughout all five phases of the bacterial life cycle in the laboratory. Using single-molecule real-time sequencing, we show that virtually all GATC sites are significantly methylated over time; nearly complete methylation of the chromosome was confirmed by mass spectroscopy analysis. However, we also detect 66 sites whose methylation patterns change significantly over time within a population, including three sites associated with sialic acid transport and catabolism, suggesting a potential role for Dam regulation of these genes; differential expression of this subset of genes was confirmed by quantitative real-time PCR. Further, we show significant growth defects of the dam mutant during long-term stationary phase (LTSP). Together these data suggest that the cell places a high premium on fully methylating the chromosome and that alterations in methylation patterns may have significant impact on patterns of transcription, maintenance of genetic fidelity, and cell survival. IMPORTANCE While it has been shown that methylation remains relatively constant into early stationary phase of E. coli, this study goes further through death phase and long-term stationary phase, a unique time in the bacterial life cycle due to nutrient limitation and strong selection for mutants with increased fitness. The absence of methylation at GATC sites can influence the mutation frequency within a population due to aberrant mismatch repair. Therefore, it is important to investigate the methylation status of GATC sites in an environment where cells may not prioritize methylation of the chromosome. This study demonstrates that chromosome methylation remains a priority even under conditions of nutrient limitation, indicating that continuous methylation at GATC sites could be under positive selection.


July 7, 2019

Cell cycle constraints and environmental control of local DNA hypomethylation in a-proteobacteria.

Heritable DNA methylation imprints are ubiquitous and underlie genetic variability from bacteria to humans. In microbial genomes, DNA methylation has been implicated in gene transcription, DNA replication and repair, nucleoid segregation, transposition and virulence of pathogenic strains. Despite the importance of local (hypo)methylation at specific loci, how and when these patterns are established during the cell cycle remains poorly characterized. Taking advantage of the small genomes and the synchronizability of a-proteobacteria, we discovered that conserved determinants of the cell cycle transcriptional circuitry establish specific hypomethylation patterns in the cell cycle model system Caulobacter crescentus. We used genome-wide methyl-N6-adenine (m6A-) analyses by restriction-enzyme-cleavage sequencing (REC-Seq) and single-molecule real-time (SMRT) sequencing to show that MucR, a transcriptional regulator that represses virulence and cell cycle genes in S-phase but no longer in G1-phase, occludes 5′-GANTC-3′ sequence motifs that are methylated by the DNA adenine methyltransferase CcrM. Constitutive expression of CcrM or heterologous methylases in at least two different a-proteobacteria homogenizes m6A patterns even when MucR is present and affects promoter activity. Environmental stress (phosphate limitation) can override and reconfigure local hypomethylation patterns imposed by the cell cycle circuitry that dictate when and where local hypomethylation is instated.


July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

A simple thermoplastic substrate containing hierarchical silica lamellae for high-molecular-weight DNA extraction.

An inexpensive, magnetic thermoplastic nanomaterial is developed utilizing a hierarchical layering of micro- and nanoscale silica lamellae to create a high-surface-area and low-shear substrate capable of capturing vast amounts of ultrahigh-molecular-weight DNA. Extraction is performed via a simple 45 min process and is capable of achieving binding capacities up to 1 000 000 times greater than silica microparticles. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


July 7, 2019

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

CoLoRMap: Correcting Long Reads by Mapping short reads.

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormapehaghshe@sfu.ca or cedric.chauve@sfu.caSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

A comparison of single-molecule emission in aluminum and gold zero-mode waveguides.

The effect of gold and aluminum zero-mode waveguides (ZMWs) on the brightness of immobilized single emitters was characterized by probing fluorophores that absorb in the green and red regions of the visible spectrum. Aluminum ZMWs enhance the emission of Atto565 fluorophores upon green excitation, but they do not enhance the emission of Atto647N fluorophores upon red excitation. Gold ZMWs increase emission of both fluorophores with Atto647N showing enhancement that is threefold higher than that observed for Atto565. This work indicates that 200 nm gold ZMWs are better suited for single-molecule fluorescence studies in the red region of the visible spectrum, while aluminum appears more suited for the green region of the visible spectrum.


July 7, 2019

Whole-genome sequence of Mycoplasma bovis strain Ningxia-1.

A genome sequence of the Mycoplasma bovis Ningxia-1 strain was tested by Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing technology. The strain was isolated from a lesioned calf lung in 2013 in Pengyang, Ningxia, China. The single circular chromosome of 1,033,629 bp shows differences between complete Mycoplasma bovis genome in insertion-like sequences (ISs), integrative conjugative elements (ICEs), lipoproteins (LPs), variable surface lipoproteins (VSPs), pathogenicity islands (PAIs), etc. Copyright © 2018 Sun et al.


July 7, 2019

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.


July 7, 2019

Completed genome sequences of strains from 36 serotypes of Salmonella.

We report here the completed closed genome sequences of strains representing 36 serotypes of Salmonella. These genome sequences will provide useful references for understanding the genetic variation between serotypes, particularly as references for mapping of raw reads or to create assemblies of higher quality, as well as to aid in studies of comparative genomics of Salmonella.© Crown copyright 2018.


July 7, 2019

Complete genome sequence of Lactococcus lactis subsp. lactis G50 with immunostimulating activity, isolated from Napier grass.

Lactococcus lactis subsp. lactis G50 is a strain with immunostimulating activity, isolated from Napier grass (Pennisetum purpureum). We determined the complete genome sequence of this strain using the PacBio RS II platform. The single circular chromosome consists of 2,346,663?bp, with 35.03% G+C content and no plasmids. Copyright © 2018 Nakano et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.