Menu
July 7, 2019

Haplotype assembly in polyploid genomes and identical by descent shared tracts.

Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction.In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data.HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/.Supplementary data are available at Bioinformatics online.


July 7, 2019

Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation.

DNA methylation serves as an important epigenetic mark in both eukaryotic and prokaryotic organisms. In eukaryotes, the most common epigenetic mark is 5-methylcytosine, whereas prokaryotes can have 6-methyladenine, 4-methylcytosine, or 5-methylcytosine. Single-molecule, real-time sequencing is capable of directly detecting all three types of modified bases. However, the kinetic signature of 5-methylcytosine is subtle, which presents a challenge for detection. We investigated whether conversion of 5-methylcytosine to 5-carboxylcytosine using the enzyme Tet1 would enhance the kinetic signature, thereby improving detection.We characterized the kinetic signatures of various cytosine modifications, demonstrating that 5-carboxylcytosine has a larger impact on the local polymerase rate than 5-methylcytosine. Using Tet1-mediated conversion, we show improved detection of 5-methylcytosine using in vitro methylated templates and apply the method to the characterization of 5-methylcytosine sites in the genomes of Escherichia coli MG1655 and Bacillus halodurans C-125.We have developed a method for the enhancement of directly detecting 5-methylcytosine during single-molecule, real-time sequencing. Using Tet1 to convert 5-methylcytosine to 5-carboxylcytosine improves the detection rate of this important epigenetic marker, thereby complementing the set of readily detectable microbial base modifications, and enhancing the ability to interrogate eukaryotic epigenetic markers.


July 7, 2019

Complete genome sequence of a multidrug-resistant Salmonella enterica serovar Typhimurium var. 5- strain isolated from chicken breast.

Salmonella enterica subsp. enterica serovar Typhimurium is a leading cause of salmonellosis. Here, we report a closed genome sequence, including sequences of 3 plasmids, of Salmonella serovar Typhimurium var. 5- CFSAN001921 (National Antimicrobial Resistance Monitoring System [NARMS] strain ID N30688), which was isolated from chicken breast meat and shows resistance to 10 different antimicrobials. Whole-genome and plasmid sequence analyses of this isolate will help enhance our understanding of this pathogenic multidrug-resistant serovar.


July 7, 2019

Precise breakpoint localization of large genomic deletions using PacBio and Illumina next-generation sequencers.

Herein we present the applicability of single-molecule (PacBio RS) and second-generation sequencing technology (Illumina) to the characterization of large genomic deletions. By testing samples previously characterized using a Sanger approach, our methods determined that both next-generation sequencing platforms were able to identify the position of deletion breakpoints. Our results point out various advantages of next-generation sequencing platforms when characterizing genomic deletions; however, special attention must be dedicated to identical sequences flanking the breakpoints, such as poly(N) motifs.


July 7, 2019

Comparing the genomes of Helicobacter pylori clinical strain UM032 and mice-adapted derivatives.

Helicobacter pylori is a Gram-negative bacterium that persistently infects the human stomach inducing chronic inflammation. The exact mechanisms of pathogenesis are still not completely understood. Although not a natural host for H. pylori, mouse infection models play an important role in establishing the immunology and pathogenicity of H. pylori. In this study, for the first time, the genome sequences of clinical H. pylori strain UM032 and mice-adapted derivatives, 298 and 299, were sequenced using the PacBio Single Molecule, Real-Time (SMRT) technology.Here, we described the single contig which was achieved for UM032 (1,599,441 bp), 298 (1,604,216 bp) and 299 (1,601,149 bp). Preliminary analysis suggested that methylation of H. pylori genome through its restriction modification system may be determinative of its host specificity and adaptation.Availability of these genomic sequences will aid in enhancing our current level of understanding the host specificity of H. pylori.


July 7, 2019

Cancer genomics: technology, discovery, and translation.

In recent years, the increasing awareness that somatic mutations and other genetic aberrations drive human malignancies has led us within reach of personalized cancer medicine (PCM). The implementation of PCM is based on the following premises: genetic aberrations exist in human malignancies; a subset of these aberrations drive oncogenesis and tumor biology; these aberrations are actionable (defined as having the potential to affect management recommendations based on diagnostic, prognostic, and/or predictive implications); and there are highly specific anticancer agents available that effectively modulate these targets. This article highlights the technology underlying cancer genomics and examines the early results of genome sequencing and the challenges met in the discovery of new genetic aberrations. Finally, drawing from experiences gained in a feasibility study of somatic mutation genotyping and targeted exome sequencing led by Princess Margaret Hospital-University Health Network and the Ontario Institute for Cancer Research, the processes, challenges, and issues involved in the translation of cancer genomics to the clinic are discussed.


July 7, 2019

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory.

Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing.We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective.The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.


July 7, 2019

Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation.

We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire bacterial genome. Using direct sequencing it is possible to generate sequence data from as little as 1 ng of DNA, offering a significant advantage over current protocols which typically require 400-500 ng of sheared DNA for the library preparation.


July 7, 2019

Sensitive and specific single-molecule sequencing of 5-hydroxymethylcytosine.

We describe strand-specific, base-resolution detection of 5-hydroxymethylcytosine (5-hmC) in genomic DNA with single-molecule sensitivity, combining a bioorthogonal, selective chemical labeling method of 5-hmC with single-molecule, real-time (SMRT) DNA sequencing. The chemical labeling not only allows affinity enrichment of 5-hmC-containing DNA fragments but also enhances the kinetic signal of 5-hmC during SMRT sequencing. We applied the approach to sequence 5-hmC in a genomic DNA sample with high confidence.


July 7, 2019

Direct detection and sequencing of damaged DNA bases.

Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template – as a by-product of the sequencing method – through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications.


July 7, 2019

Computational solutions to large-scale data management and analysis.

Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist – such as cloud and heterogeneous computing – to successfully tackle our big data problems.


July 7, 2019

Implementation and data analysis of Tn-seq, whole genome resequencing, and single-molecule real time sequencing for bacterial genetics.

Few discoveries have been more transformative to the biological sciences than the development of DNA sequencing technologies. The rapid advancement of sequencing and bioinformatics tools has revolutionized bacterial genetics, deepening our understanding of model and clinically relevant organisms. Although application of newer sequencing technologies to studies in bacterial genetics is increasing, the implementation of DNA sequencing technologies and development of the bioinformatics tools required for analyzing the large data sets generated remains a challenge for many. In this minireview, we have chosen to summarize three sequencing approaches that are particularly useful for bacterial genetics. We provide resources for scientists new to and interested in their application. Herein, we discuss the analysis of Tn-seq data to determine gene disruptions differentially represented in a mutant population, Illumina sequencing for identification of suppressor or other mutations, and we summarize single-molecule real time (SMRT) sequencing for de novo genome assembly and the use of the output data for detection of DNA base modifications. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.