Menu
July 19, 2019

Single haplotype assembly of the human genome from a hydatidiform mole.

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.


July 19, 2019

A random six-phase switch regulates pneumococcal virulence via global epigenetic changes.

Streptococcus pneumoniae (the pneumococcus) is the world’s foremost bacterial pathogen in both morbidity and mortality. Switching between phenotypic forms (or ‘phases’) that favour asymptomatic carriage or invasive disease was first reported in 1933. Here, we show that the underlying mechanism for such phase variation consists of genetic rearrangements in a Type I restriction-modification system (SpnD39III). The rearrangements generate six alternative specificities with distinct methylation patterns, as defined by single-molecule, real-time (SMRT) methylomics. The SpnD39III variants have distinct gene expression profiles. We demonstrate distinct virulence in experimental infection and in vivo selection for switching between SpnD39III variants. SpnD39III is ubiquitous in pneumococci, indicating an essential role in its biology. Future studies must recognize the potential for switching between these heretofore undetectable, differentiated pneumococcal subpopulations in vitro and in vivo. Similar systems exist in other bacterial genera, indicating the potential for broad exploitation of epigenetic gene regulation.


July 19, 2019

One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.

Like a jigsaw puzzle with large pieces, a genome sequenced with long reads is easier to assemble. However, recent sequencing technologies have favored lowering per-base cost at the expense of read length. This has dramatically reduced sequencing cost, but resulted in fragmented assemblies, which negatively affect downstream analyses and hinder the creation of finished (gapless, high-quality) genomes. In contrast, emerging long-read sequencing technologies can now produce reads tens of kilobases in length, enabling the automated finishing of microbial genomes for under $1000. This promises to improve the quality of reference databases and facilitate new studies of chromosomal structure and variation. We present an overview of these new technologies and the methods used to assemble long reads into complete genomes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.


July 19, 2019

Resolving complex tandem repeats with long reads.

Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington’s diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs.Here we present PacmonSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3-stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


July 19, 2019

BREX is a novel phage resistance system widespread in microbial genomes.

The perpetual arms race between bacteria and phage has resulted in the evolution of efficient resistance systems that protect bacteria from phage infection. Such systems, which include the CRISPR-Cas and restriction-modification systems, have proven to be invaluable in the biotechnology and dairy industries. Here, we report on a six-gene cassette in Bacillus cereus which, when integrated into the Bacillus subtilis genome, confers resistance to a broad range of phages, including both virulent and temperate ones. This cassette includes a putative Lon-like protease, an alkaline phosphatase domain protein, a putative RNA-binding protein, a DNA methylase, an ATPase-domain protein, and a protein of unknown function. We denote this novel defense system BREX (Bacteriophage Exclusion) and show that it allows phage adsorption but blocks phage DNA replication. Furthermore, our results suggest that methylation on non-palindromic TAGGAG motifs in the bacterial genome guides self/non-self discrimination and is essential for the defensive function of the BREX system. However, unlike restriction-modification systems, phage DNA does not appear to be cleaved or degraded by BREX, suggesting a novel mechanism of defense. Pan genomic analysis revealed that BREX and BREX-like systems, including the distantly related Pgl system described in Streptomyces coelicolor, are widely distributed in ~10% of all sequenced microbial genomes and can be divided into six coherent subtypes in which the gene composition and order is conserved. Finally, we detected a phage family that evades the BREX defense, implying that anti-BREX mechanisms may have evolved in some phages as part of their arms race with bacteria.© 2014 The Authors.


July 19, 2019

ModM DNA methyltransferase methylome analysis reveals a potential role for Moraxella catarrhalis phasevarions in otitis media.

Moraxella catarrhalis is a significant cause of otitis media and exacerbations of chronic obstructive pulmonary disease. Here, we characterize a phase-variable DNA methyltransferase (ModM), which contains 5′-CAAC-3′ repeats in its open reading frame that mediate high-frequency mutation resulting in reversible on/off switching of ModM expression. Three modM alleles have been identified (modM1-3), with modM2 being the most commonly found allele. Using single-molecule, real-time (SMRT) genome sequencing and methylome analysis, we have determined that the ModM2 methylation target is 5′-GAR(m6)AC-3′, and 100% of these sites are methylated in the genome of the M. catarrhalis 25239 ModM2 on strain. Proteomic analysis of ModM2 on and off variants revealed that ModM2 regulates expression of multiple genes that have potential roles in colonization, infection, and protection against host defenses. Investigation of the distribution of modM alleles in a panel of M. catarrhalis strains, isolated from the nasopharynx of healthy children or middle ear effusions from patients with otitis media, revealed a statistically significant association of modM3 with otitis media isolates. The modulation of gene expression via the ModM phase-variable regulon (phasevarion), and the significant association of the modM3 allele with otitis media, suggests a key role for ModM phasevarions in the pathogenesis of this organism.-Blakeway, L. V., Power, P. M., Jen, F. E.-C., Worboys, S. R., Boitano, M., Clark, T. A., Korlach, J., Bakaletz, L. O., Jennings, M. P., Peak, I. R., Seib, K. L. ModM DNA methyltransferase methylome analysis reveals a potential role for Moraxella catarrhalis phasevarions in otitis media. © FASEB.


July 19, 2019

Comparison of genome sequencing technology and assembly methods for the analysis of a GC-rich bacterial genome.

Improvements in technology and decreases in price have made de novo bacterial genomic sequencing a reality for many researchers, but it has created a need to evaluate the methods for generating a complete and accurate genome assembly. We sequenced the GC-rich Caulobacter henricii genome using the Illumina MiSeq, Roche 454, and Pacific Biosciences RS II sequencing systems. To generate a complete genome sequence, we performed assemblies using eight readily available programs and found that builds using the Illumina MiSeq and the Roche 454 data produced accurate yet numerous contigs. SPAdes performed the best followed by PANDAseq. In contrast, the Celera assembler produced a single genomic contig using the Pacific Biosciences data after error correction with the Illumina MiSeq data. In addition, we duplicated this build using the Pacific Biosciences data with HGAP2.0. The accuracy of these builds was verified by pulsed-field gel electrophoresis of genomic DNA cut with restriction enzymes.


July 19, 2019

Efficient local alignment discovery amongst noisy long reads

Long read sequencers portend the possibility of producing reference quality genomes not only because the reads are long, but also because sequencing errors and read sampling are almost perfectly random. However, the error rates are as high as 15%, necessitating an efficient algorithm for finding local alignments between reads at a 30% difference rate, a level that current algorithm designs cannot handle or handle inefficiently. In this paper we present a very efficient yet highly sensitive, threaded filter, based on a novel sort and merge paradigm, that proposes seed points between pairs of reads that are likely to have a significant local alignment passing through them. We also present a linear expected-time heuristic based on the classic O(nd) difference algorithm [1] that finds a local alignment passing through a seed point that is exceedingly sensitive, failing but once every billion base pairs. These two results have been combined into a software program we call DALIGN that realizes the fastest program to date for finding overlaps and local alignments in very noisy long read DNA sequencing data sets and is thus a prelude to de novo long read assembly


July 19, 2019

Exploring bacterial epigenomics in the next-generation sequencing era: a new approach for an emerging frontier.

Epigenetics has an important role for the success of foodborne pathogen persistence in diverse host niches. Substantial challenges exist in determining DNA methylation to situation-specific phenotypic traits. DNA modification, mediated by restriction-modification systems, functions as an immune response against antagonistic external DNA, and bacteriophage-acquired methyltransferases (MTase) and orphan MTases – those lacking the cognate restriction endonuclease – facilitate evolution of new phenotypes via gene expression modulation via DNA and RNA modifications, including methylation and phosphorothioation. Recent establishment of large-scale genome sequencing projects will result in a significant increase in genome availability that will lead to new demands for data analysis including new predictive bioinformatics approaches that can be verified with traditional scientific rigor. Sequencing technologies that detect modification coupled with mass spectrometry to discover new adducts is a powerful tactic to study bacterial epigenetics, which is poised to make novel and far-reaching discoveries that link biological significance and the bacterial epigenome. Copyright © 2014 Elsevier Ltd. All rights reserved.


July 19, 2019

A comparative analysis of methylome profiles of Campylobacter jejuni sheep abortion isolate and gastroenteric strains using PacBio data.

Campylobacter jejuni is a leading cause of human gastrointestinal disease and small ruminant abortions in the United States. The recent emergence of a highly virulent, tetracycline-resistant C. jejuni subsp. jejuni sheep abortion clone (clone SA) in the United States, and that strain’s association with human disease, has resulted in a heightened awareness of the zoonotic potential of this organism. Pacific Biosciences’ Single Molecule, Real-Time sequencing technology was used to explore the variation in the genome-wide methylation patterns of the abortifacient clone SA (IA3902) and phenotypically distinct gastrointestinal-specific C. jejuni strains (NCTC 11168 and 81-176). Several notable differences were discovered that distinguished the methylome of IA3902 from that of 11168 and 81-176: identification of motifs novel to IA3902, genome-specific hypo- and hypermethylated regions, strain level variability in genes methylated, and differences in the types of methylation motifs present in each strain. These observations suggest a possible role of methylation in the contrasting disease presentations of these three C. jejuni strains. In addition, the methylation profiles between IA3902 and a luxS mutant were explored to determine if variations in methylation patterns could be identified that might explain the role of LuxS-dependent methyl recycling in IA3902 abortifacient potential.


July 19, 2019

Long-read, whole-genome shotgun sequence data for five model organisms.

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.


July 19, 2019

Evolution of hypervirulence by a MRSA clone through acquisition of a transposable element.

Staphylococcus aureus has evolved as a pathogen that causes a range of diseases in humans. There are two dominant modes of evolution thought to explain most of the virulence differences between strains. First, virulence genes may be acquired from other organisms. Second, mutations may cause changes in the regulation and expression of genes. Here we describe an evolutionary event in which transposition of an IS element has a direct impact on virulence gene regulation resulting in hypervirulence. Whole-genome analysis of a methicillin-resistant S. aureus (MRSA) strain USA500 revealed acquisition of a transposable element (IS256) that is absent from close relatives of this strain. Of the multiple copies of IS256 found in the USA500 genome, one was inserted in the promoter sequence of repressor of toxins (Rot), a master transcriptional regulator responsible for the expression of virulence factors in S. aureus. We show that insertion into the rot promoter by IS256 results in the derepression of cytotoxin expression and increased virulence. Taken together, this work provides new insight into evolutionary strategies by which S. aureus is able to modify its virulence properties and demonstrates a novel mechanism by which horizontal gene transfer directly impacts virulence through altering toxin regulation. © 2014 John Wiley & Sons Ltd.


July 19, 2019

Analysis of the Campylobacter jejuni genome by SMRT DNA Sequencing identifies restriction-modification motifs.

Campylobacter jejuni is a leading bacterial cause of human gastroenteritis. The goal of this study was to analyze the C. jejuni F38011 strain, recovered from an individual with severe enteritis, at a genomic and proteomic level to gain insight into microbial processes. The C. jejuni F38011 genome is comprised of 1,691,939 bp, with a mol.% (G+C) content of 30.5%. PacBio sequencing coupled with REBASE analysis was used to predict C. jejuni F38011 genomic sites and enzymes that may be involved in DNA restriction-modification. A total of five putative methylation motifs were identified as well as the C. jejuni enzymes that could be responsible for the modifications. Peptides corresponding to the deduced amino acid sequence of the C. jejuni enzymes were identified using proteomics. This work sets the stage for studies to dissect the precise functions of the C. jejuni putative restriction-modification enzymes. Taken together, the data generated in this study contributes to our knowledge of the genomic content, methylation profile, and encoding capacity of C. jejuni.


July 19, 2019

Completing bacterial genome assemblies: strategy and performance comparisons.

Determining the genomic sequences of microorganisms is the basis and prerequisite for understanding their biology and functional characterization. While the advent of low-cost, extremely high-throughput second-generation sequencing technologies and the parallel development of assembly algorithms have generated rapid and cost-effective genome assemblies, such assemblies are often unfinished, fragmented draft genomes as a result of short read lengths and long repeats present in multiple copies. Third-generation, PacBio sequencing technologies circumvented this problem by greatly increasing read length. Hybrid approaches including ALLPATHS-LG, PacBio corrected reads pipeline, SPAdes, and SSPACE-LongRead, and non-hybrid approaches-hierarchical genome-assembly process (HGAP) and PacBio corrected reads pipeline via self-correction-have therefore been proposed to utilize the PacBio long reads that can span many thousands of bases to facilitate the assembly of complete microbial genomes. However, standardized procedures that aim at evaluating and comparing these approaches are currently insufficient. To address the issue, we herein provide a comprehensive comparison by collecting datasets for the comparative assessment on the above-mentioned five assemblers. In addition to offering explicit and beneficial recommendations to practitioners, this study aims to aid in the design of a paradigm positioned to complete bacterial genome assembly.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.