With Single Molecule, Real-Time (SMRT) Sequencing and the Sequel Systems, you can affordably assemble reference-quality microbial genomes that are >99.999% (Q50) accurate.
Comparative genomics of Shiga toxin-producing Escherichia coli O145:H28 strains associated with the 2007 Belgium and 2010 US outbreaks.
Shiga toxin-producing Escherichia coli (STEC) is an emerging pathogen. Recently there has been a global in the number of outbreaks caused by non-O157 STECs, typically involving six serogroups O26, O45, 0103, 0111, and 0145. STEC O145:H28 has been associated with severe human disease including hemolytic-uremic syndrome (HUS), and is demonstrated by the 2007 Belgian ice-cream-associated outbreak and 2010 US lettuce-associated outbreak, with over 10% of patients developing HUS in each. The goal of this work was to do comparative genomics of strains, clinical and environmental, to investigate genome diversity and virulence evolution of this important foodborne pathogen.
Understanding the genetic basis of infectious diseases is critical to enacting effective treatments, and several large-scale sequencing initiatives are underway to collect this information. Sequencing bacterial samples is typically performed by mapping sequence reads against genomes of known reference strains. While such resequencing informs on the spectrum of single-nucleotide differences relative to the chosen reference, it can miss numerous other forms of variation known to influence pathogenicity: structural variations (duplications, inversions), acquisition of mobile elements (phages, plasmids), homonucleotide length variation causing phase variation, and epigenetic marks (methylation, phosphorothioation) that influence gene expression to switch bacteria from non- pathogenic to pathogenic states. Therefore, sequencing methods which provide complete, de novo genome assemblies and epigenomes are necessary to fully characterize infectious disease agents in an unbiased, hypothesis-free manner. Hybrid assembly methods have been described that combine long sequence reads from SMRT DNA Sequencing with short reads (SMRT CCS (circular consensus) or second-generation reads), wherein the short reads are used to error-correct the long reads which are then used for assembly. We have developed a new paradigm for microbial de novo assemblies in which SMRT sequencing reads from a single long insert library are used exclusively to close the genome through a hierarchical genome assembly process, thereby obviating the need for a second sample preparation, sequencing run, and data set. We have applied this method to achieve closed de novo genomes with accuracies exceeding QV50 (>99.999%) for numerous disease outbreak samples, including E. coli, Salmonella, Campylobacter, Listeria, Neisseria, and H. pylori. The kinetic information from the same SMRT Sequencing reads is utilized to determine epigenomes. Approximately 70% of all methyltransferase specificities we have determined to date represent previously unknown bacterial epigenetic signatures. With relatively short sequencing run times and automated analysis pipelines, it is possible to go from an unknown DNA sample to its complete de novo genome and epigenome in about a day.
2015 SMRT Informatics Developers Conference Presentation Slides: Shinichi Morishita of the University of Tokyo presented on how his team has been using SMRT Sequencing to better understand methylomes, metagenomes and structural variation of various eukaryotic genomes.
Functional genomics reveals extensive diversity in Staphylococcus epidermidis restriction modification systems compared to Staphylococcus aureus
Staphylococcus epidermidis is a significant opportunistic pathogen of humans. Molecular studies in this species have been hampered by the presence of restriction-modification (RM) systems that limit introduction of foreign DNA. Here we establish the complete genomes and methylomes for seven clinically significant, genetically diverse S. epidermidis isolates and perform the first systematic genomic analyses of the type I RM systems within both S. epidermidis and Staphylococcus aureus. Our analyses revealed marked differences in the gene arrangement, chromosomal location and movement of type I RM systems between the two species. Unlike S. aureus, S. epidermidis type I RM systems demonstrate extensive diversity even within a single genetic lineage. This is contrary to current assumptions and has important implications for approaching the genetic manipulation of S. epidermidis. Using Escherichia coli plasmid artificial modification (PAM) to express S. epidermidis hsdMS, we readily overcame restriction barriers in S. epidermidis, and achieved transformation efficiencies equivalent to those of modification deficient mutants. With these functional experiments we demonstrate how genomic data can be used to predict both the functionality of type I RM systems and the potential for a strain to be transformation proficient. We outline an efficient approach for the genetic manipulation of S. epidermidis from diverse genetic backgrounds, including those that have hitherto been intractable. Additionally, we identified S. epidermidis BPH0736, a naturally restriction defective, clinically significant, multidrug-resistant ST2 isolate as an ideal candidate for molecular studies.
Genome Sequences and Methylation Patterns of Natrinema versiforme BOL5-4 and Natrinema pallidum BOL6-1, Two Extremely Halophilic Archaea from a Bolivian Salt Mine.
Two extremely halophilic archaea, namely, Natrinema versiforme BOL5-4 and Natrinema pallidum BOL6-1, were isolated from a Bolivian salt mine and their genomes sequenced using single-molecule real-time sequencing. The GC-rich genomes of BOL5-4 and BOL6-1 were 4.6 and 3.8 Mbp, respectively, with large chromosomes and multiple megaplasmids. Genome annotation was incorporated into HaloWeb and methylation patterns incorporated into REBASE.Copyright © 2019 DasSarma et al.
Here, we report the complete genome sequence and full methylome analysis of a newly isolated, aerobic, thermophilic, Gram-positive actinomycete, a strain of Thermoactinomyces vulgaris designated strain 2H.Copyright © 2019 Mankai et al.
Comparison of DNA Methylation in Vibrio vulnificus Cells Grown in Human Serum with Those Grown in Seawater.
The chromosomal methylation statuses of the highly virulent Vibrio vulnificus strain CMCP6 grown in human serum and in seawater are compared here. Growth in seawater resulted in ~4 times as much methylation as that in human serum, primarily N4-methylcytosines.Copyright © 2019 Conrad and Harwood.
Methylomes of Two Extremely Halophilic Archaea Species, Haloarcula marismortui and Haloferax mediterranei.
The genomes of two extremely halophilic Archaea species, Haloarcula marismortui and Haloferax mediterranei, were sequenced using single-molecule real-time sequencing. The ~4-Mbp genomes are GC rich with multiple large plasmids and two 4-methyl-cytosine patterns. Methyl transferases were incorporated into the Restriction Enzymes Database (REBASE), and gene annotation was incorporated into the Haloarchaeal Genomes Database (HaloWeb).Copyright © 2019 DasSarma et al.
Complete Genome Sequence and Methylome Analysis of Micrococcus luteus SA211, a Halophilic, Lithium-Tolerant Actinobacterium from Argentina.
Micrococcus luteus has been found in a wide range of habitats. We report the complete genome sequence and methylome analysis of strain SA211 isolated from a hypersaline, lithium-rich, high-altitude salt flat in Argentina with single-molecule real-time sequencing.
Prokaryotic DNA contains three types of methylation: N6-methyladenine, N4-methylcytosine and 5-methylcytosine. The lack of tools to analyse the frequency and distribution of methylated residues in bacterial genomes has prevented a full understanding of their functions. Now, advances in DNA sequencing technology, including single-molecule, real-time sequencing and nanopore-based sequencing, have provided new opportunities for systematic detection of all three forms of methylated DNA at a genome-wide scale and offer unprecedented opportunities for achieving a more complete understanding of bacterial epigenomes. Indeed, as the number of mapped bacterial methylomes approaches 2,000, increasing evidence supports roles for methylation in regulation of gene expression, virulence and pathogen-host interactions.
The complete genome sequence of Ethanoligenens harbinense reveals the metabolic pathway of acetate-ethanol fermentation: A novel understanding of the principles of anaerobic biotechnology.
Ethanol-type fermentation is one of three main fermentation types in the acidogenesis of anaerobic treatment systems. Non-spore-forming Ethanoligenens is as a typical genus capable of ethanol-type fermentation in mixed culture (i.e. acetate-ethanol fermentation). This genus can produce ethanol, acetate, CO2, and H2 using carbohydrates, and has application potential in anaerobic bioprocesses. Here, the complete genome sequences and methylome of Ethanoligenens harbinense strains with different autoaggregative and coaggregative abilities were obtained using the PacBio single-molecule real-time sequencing platform. The genome size of E. harbinense strains was about 2.97-3.10?Mb with 55.5% G+C content. 3020-3153 genes were annotated, most of which were methylated at specific sites or motifs. The methylation types included 6mA, 4mC, and unknown types. Comparative genomic analysis demonstrated low levels of genetic similarity between E. harbinense and other well-known hydrogen-producing bacteria (i.e., Clostridium and Thermoanaerobacter) in phylogenesis. Hydrogen production of E. harbinense was catalyzed by genes that encode [FeFe]-hydrogenases and that were synthesized by three maturases of [FeFe]-H2ase. The metabolic mechanism of H2-ethanol co-production fermentation, catalyzed by pyruvate ferredoxin oxidoreductase was proposed. This study provides genetic and evolutionary information of a model genus for the further investigation of the metabolic pathway and regulatory network of ethanol-type fermentation and anaerobic bioprocesses for waste or wastewater treatment.Copyright © 2019. Published by Elsevier Ltd.
Genome-wide analysis of DNA methylation patterns using single molecule real-time DNA sequencing has boosted the number of publicly available methylomes. However, there is a lack of tools coupling methylation patterns and the corresponding methyltransferase genes. Here we demonstrate a high-throughput method for coupling methyltransferases with their respective motifs, using automated cloning and analysing the methyltransferases in vectors carrying a strain-specific cassette containing all potential target sites. To validate the method, we analyse the genomes of the thermophile Moorella thermoacetica and the mesophile Acetobacterium woodii, two acetogenic bacteria having substantially modified genomes with 12 methylation motifs and a total of 23 methyltransferase genes. Using our method, we characterize the 23 methyltransferases, assign motifs to the respective enzymes and verify activity for 11 of the 12 motifs.
The complete genome of Cordyceps militaris was sequenced using single-molecule real-time (SMRT) sequencing technology at a coverage over 300×. The genome size was 32.57?Mb, and 14 contigs ranging from 0.35 to 4.58?Mb with an N50 of 2.86?Mb were assembled, including 4 contigs with telomeric sequences on both ends and an additional 8 contigs with telomeric sequences on either the 5′ or 3′ end. A methylome database of the genome was constructed using SMRT and m4C and m6A methylated nucleotides, and many unknown modification types were identified. The major m6A methylation motif is GA and GGAG, and the major m4C methylation motif is GC or CG/GC. In the C. militaris genome DNA, there were four types of methylated nucleotides that we confirmed using high-resolution LCMS-IT-TOF. Using PacBio Iso-Seq, a total of 31,133 complete cDNA sequences were obtained in the fruiting body. The conserved domains of the nontranscribed regions of the genome include TATA boxes, which are the initial regions of genome replication. There were 406 structural variants between the HN and CM01 strains, and there were 1,114 structural variants between the HN and ATCC strains.
Background Helicobacter pylori is a Gram-negative bacterium which mainly causes peptic ulcer disease in human, but is also the predominant cause of stomach cancer. It has been coevolving with human since 120,000 years and, according to Multi-locus sequence typing (MLST), H. pylori can be classified into seven major population types, namely, hpAfrica1, hpAfrica2, hpNEAfrica, hpEastAsia, hpAsia2, hpEurope and hpSahul. Helicobacter pylori harbours a large number of restriction-modification (R-M) systems. The methyltransferase (MTase) unit plays a significant role in gene regulation and also possibly modulates pathogenicity. The diversity in MTase can act as geomarkers to correlate strains with the phylogeographic origins. This paper describes the complete genome sequence and methylome of gastric pathogen H. pylori belonging to the population hpNEAfrica. Results In this paper, we present the complete genome sequence and the methylome profile of H. pylori hpNEAfrica strain HP14039, isolated from a patient who was born in Somalia and likely to be infected locally during early childhood prior to migration. The genome of HP14039 consists of 1,678,260 bp with 1574 coding genes and 38.7% GC content. The sequence analysis showed that this strain lacks the cag pathogenicity island. The vacA gene is of S2M2 type. We have also identified 15 methylation motifs, including WCANHNNNNTG and CTANNNNNNNTAYG that were not previously described. Conclusions We have described the complete genome of H. pylori strain HP14039. The information regarding phylo-geography, methylome and associated metadata would help scientific community to study more about hpNEAfrica population type.