June 1, 2021  |  

Automated, non-hybrid de novo genome assemblies and epigenomes of bacterial pathogens.

Understanding the genetic basis of infectious diseases is critical to enacting effective treatments, and several large-scale sequencing initiatives are underway to collect this information. Sequencing bacterial samples is typically performed by mapping sequence reads against genomes of known reference strains. While such resequencing informs on the spectrum of single-nucleotide differences relative to the chosen reference, it can miss numerous other forms of variation known to influence pathogenicity: structural variations (duplications, inversions), acquisition of mobile elements (phages, plasmids), homonucleotide length variation causing phase variation, and epigenetic marks (methylation, phosphorothioation) that influence gene expression to switch bacteria from non- pathogenic to pathogenic states. Therefore, sequencing methods which provide complete, de novo genome assemblies and epigenomes are necessary to fully characterize infectious disease agents in an unbiased, hypothesis-free manner. Hybrid assembly methods have been described that combine long sequence reads from SMRT DNA Sequencing with short reads (SMRT CCS (circular consensus) or second-generation reads), wherein the short reads are used to error-correct the long reads which are then used for assembly. We have developed a new paradigm for microbial de novo assemblies in which SMRT sequencing reads from a single long insert library are used exclusively to close the genome through a hierarchical genome assembly process, thereby obviating the need for a second sample preparation, sequencing run, and data set. We have applied this method to achieve closed de novo genomes with accuracies exceeding QV50 (>99.999%) for numerous disease outbreak samples, including E. coli, Salmonella, Campylobacter, Listeria, Neisseria, and H. pylori. The kinetic information from the same SMRT Sequencing reads is utilized to determine epigenomes. Approximately 70% of all methyltransferase specificities we have determined to date represent previously unknown bacterial epigenetic signatures. With relatively short sequencing run times and automated analysis pipelines, it is possible to go from an unknown DNA sample to its complete de novo genome and epigenome in about a day.


June 1, 2021  |  

New discoveries from closing Salmonella genomes using Pacific Biosciences continuous long reads.

The newer hierarchical genome assembly process (HGAP) performs de novo assembly using data from a single PacBio long insert library. To assess the benefits of this method, DNA from several Salmonella enterica serovars was isolated from a pure culture. Genome sequencing was performed using Pacific Biosciences RS sequencing technology. The HGAP process enabled us to close sixteen Salmonella subsp. enterica genomes and their associated mobile elements: The ten serotypes include: Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis) S. Bareilly, S. Heidelberg, S. Cubana, S. Javiana and S. Typhimurium, S. Newport, S. Montevideo, S. Agona, and S. Tennessee. In addition, we were able to detect novel methyltransferases (MTases) by using the Pacific Biosciences kinetic score distributions showing that each serovar appears to have a novel methylation pattern. For example while all Salmonella serovars examined so far have methylase specific activity for 5’-GATC-3’/3’-CTAG-5’ and 5’-CAGAG-3’/3’-GTCTC-5’ (underlined base indicates a modification), S. Heidelberg is uniquely specific for 5’-ACCANCC-3’/3’-TGGTNGG-5’, while S. Typhimurium has uniquely methylase specific for 5′-GATCAG-3’/3′- CTAGTC-5′ sites, for the samples examined so far. We believe that this may be due to the unique environments and phages that these serotypes have been exposed to. Furthermore, our analysis identified and closed a variety of plasmids such as mobilization plasmids, antimicrobial resistance plasmids and IncX plasmids carrying a Type IV secretion system (T4SS). The VirB/D4 T4SS apparatus is important in that it assists with rapid dissemination of antibiotic resistance and virulence determinants. Presently, only limited information exists regarding the genotypic characterization of drug resistance in S. Heidelberg isolates derived from various host species. Here, we characterize two S. Heidelberg outbreak isolates from two different outbreaks. Both isolates contain the IncX plasmid of approximately 35 kb, and carried the genes virB1, virB2, virB3/4, virB5, virB6, virB7, virB8, virB9, virB10, virB11, virD2, and virD4, that are associated with the T4SS. In addition, the outbreak isolate associated with ground turkey carries a 4,473 bp mobilization plasmid and an incompatibility group (Inc) I1 antimicrobial resistance plasmid encoding resistance to gentamicin (aacC2), beta-lactam (bl2b_tem), streptomycin (aadAI) and tetracycline (tetA, tetR) while the outbreak isolate associated with chicken breast carries the IncI1 plasmid encoding resistance to gentamicin (aacC2), streptomycin (aadAI) and sulfisoxazole (sul1). Using this new technology we explored the genetic elements present in resistant pathogens which will achieve a better understanding of the evolution of Salmonella.


June 1, 2021  |  

Integrative biology of a fungus: Using PacBio SMRT Sequencing to interrogate the genome, epigenome, and transcriptome of Neurospora crassa.

PacBio SMRT Sequencing has the unique ability to directly detect base modifications in addition to the nucleotide sequence of DNA. Because eukaryotes use base modifications to regulate gene expression, the absence or presence of epigenetic events relative to the location of genes is critical to elucidate the function of the modification. Therefore an integrated approach that combines multiple omic-scale assays is necessary to study complex organisms. Here, we present an integrated analysis of three sequencing experiments: 1) DNA sequencing, 2) base-modification detection, and 3) Iso-seq analysis, in Neurospora crassa, a filamentous fungus that has been used to make many landmark discoveries in biochemistry and genetics. We show that de novo assembly of a new strain yields complete assemblies of entire chromosomes, and additionally contains entire centromeric sequences. Base-modification analyses reveal candidate sites of increased interpulse duration (IPD) ratio, that may signify regions of 5mC, 5hmC, or 6mA base modifications. Iso-seq method provides full-length transcript evidence for comprehensive gene annotation, as well as context to the base-modifications in the newly assembled genome. Projects that integrate multiple genome-wide assays could become common practice for identifying genomic elements and understanding their function in new strains and organisms.


April 21, 2020  |  

A Novel Bacteriophage Exclusion (BREX) System Encoded by the pglX Gene in Lactobacillus casei Zhang.

The bacteriophage exclusion (BREX) system is a novel prokaryotic defense system against bacteriophages. To our knowledge, no study has systematically characterized the function of the BREX system in lactic acid bacteria. Lactobacillus casei Zhang is a probiotic bacterium originating from koumiss. By using single-molecule real-time sequencing, we previously identified N6-methyladenine (m6A) signatures in the genome of L. casei Zhang and a putative methyltransferase (MTase), namely, pglX This work further analyzed the genomic locus near the pglX gene and identified it as a component of the BREX system. To decipher the biological role of pglX, an L. casei Zhang pglX mutant (?pglX) was constructed. Interestingly, m6A methylation of the 5′-ACRCAG-3′ motif was eliminated in the ?pglX mutant. The wild-type and mutant strains exhibited no significant difference in morphology or growth performance in de Man-Rogosa-Sharpe (MRS) medium. A significantly higher plasmid acquisition capacity was observed for the ?pglX mutant than for the wild type if the transformed plasmids contained pglX recognition sites (i.e., 5′-ACRCAG-3′). In contrast, no significant difference was observed in plasmid transformation efficiency between the two strains when plasmids lacking pglX recognition sites were tested. Moreover, the ?pglX mutant had a lower capacity to retain the plasmids than the wild type, suggesting a decrease in genetic stability. Since the Rebase database predicted that the L. casei PglX protein was bifunctional, as both an MTase and a restriction endonuclease, the PglX protein was heterologously expressed and purified but failed to show restriction endonuclease activity. Taken together, the results show that the L. casei Zhang pglX gene is a functional adenine MTase that belongs to the BREX system.IMPORTANCELactobacillus casei Zhang is a probiotic that confers beneficial effects on the host, and it is thus increasingly used in the dairy industry. The possession of an effective bacterial immune system that can defend against invasion of phages and exogenous DNA is a desirable feature for industrial bacterial strains. The bacteriophage exclusion (BREX) system is a recently described phage resistance system in prokaryotes. This work confirmed the function of the BREX system in L. casei and that the methyltransferase (pglX) is an indispensable part of the system. Overall, our study characterizes a BREX system component gene in lactic acid bacteria. Copyright © 2019 American Society for Microbiology.


April 21, 2020  |  

De novo genome assembly of the stress tolerant forest species Casuarina equisetifolia provides insight into secondary growth.

Casuarina equisetifolia (C. equisetifolia), a conifer-like angiosperm with resistance to typhoon and stress tolerance, is mainly cultivated in the coastal areas of Australasia. C. equisetifolia, making it a valuable model to study secondary growth associated genes and stress-tolerance traits. However, the genome sequence is unavailable and therefore wood-associated growth rate and stress resistance at the molecular level is largely unexplored. We therefore constructed a high-quality draft genome sequence of C. equisetifolia by a combination of Illumina second-generation sequencing reads and Pacific Biosciences single-molecule real-time (SMRT) long reads to advance the investigation of this species. Here, we report the genome assembly, which contains approximately 300 megabases (Mb) and scaffold size of N50 is 1.06 Mb. Additionally, gene annotation, assisted by a combination of prediction and RNA-seq data, generated 29 827 annotated protein-coding genes and 1983 non-coding genes, respectively. Furthermore, we found that the total number of repetitive sequences account for one-third of the genome assembly. Here we also construct the genome-wide map of DNA modification, such as two novel forms N6 -adenine (6mA) and N4-methylcytosine (4mC) at the level of single-nucleotide resolution using single-molecule real-time (SMRT) sequencing. Interestingly, we found that 17% of 6mA modification genes and 15% of 4mC modification genes also included alternative splicing events. Finally, we investigated cellulose, hemicellulose, and lignin-related genes, which were associated with secondary growth and contained different DNA modifications. The high-quality genome sequence and annotation of C. equisetifolia in this study provide a valuable resource to strengthen our understanding of the diverse traits of trees. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.


April 21, 2020  |  

RADAR-seq: A RAre DAmage and Repair sequencing method for detecting DNA damage on a genome-wide scale.

RAre DAmage and Repair sequencing (RADAR-seq) is a highly adaptable sequencing method that enables the identification and detection of rare DNA damage events for a wide variety of DNA lesions at single-molecule resolution on a genome-wide scale. In RADAR-seq, DNA lesions are replaced with a patch of modified bases that can be directly detected by Pacific Biosciences Single Molecule Real-Time (SMRT) sequencing. RADAR-seq enables dynamic detection over a wide range of DNA damage frequencies, including low physiological levels. Furthermore, without the need for DNA amplification and enrichment steps, RADAR-seq provides sequencing coverage of damaged and undamaged DNA across an entire genome. Here, we use RADAR-seq to measure the frequency and map the location of ribonucleotides in wild-type and RNaseH2-deficient E. coli and Thermococcus kodakarensis strains. Additionally, by tracking ribonucleotides incorporated during in vivo lagging strand DNA synthesis, we determined the replication initiation point in E. coli, and its relation to the origin of replication (oriC). RADAR-seq was also used to map cyclobutane pyrimidine dimers (CPDs) in Escherichia coli (E. coli) genomic DNA exposed to UV-radiation. On a broader scale, RADAR-seq can be applied to understand formation and repair of DNA damage, the correlation between DNA damage and disease initiation and progression, and complex biological pathways, including DNA replication.Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.


April 21, 2020  |  

Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense.

Allotetraploid cotton species (Gossypium hirsutum and Gossypium barbadense) have long been cultivated worldwide for natural renewable textile fibers. The draft genome sequences of both species are available but they are highly fragmented and incomplete1-4. Here we report reference-grade genome assemblies and annotations for G. hirsutum accession Texas Marker-1 (TM-1) and G. barbadense accession 3-79 by integrating single-molecule real-time sequencing, BioNano optical mapping and high-throughput chromosome conformation capture techniques. Compared with previous assembled draft genomes1,3, these genome sequences show considerable improvements in contiguity and completeness for regions with high content of repeats such as centromeres. Comparative genomics analyses identify extensive structural variations that probably occurred after polyploidization, highlighted by large paracentric/pericentric inversions in 14 chromosomes. We constructed an introgression line population to introduce favorable chromosome segments from G. barbadense to G. hirsutum, allowing us to identify 13 quantitative trait loci associated with superior fiber quality. These resources will accelerate evolutionary and functional genomic studies in cotton and inform future breeding programs for fiber improvement.


April 21, 2020  |  

Deciphering bacterial epigenomes using modern sequencing technologies.

Prokaryotic DNA contains three types of methylation: N6-methyladenine, N4-methylcytosine and 5-methylcytosine. The lack of tools to analyse the frequency and distribution of methylated residues in bacterial genomes has prevented a full understanding of their functions. Now, advances in DNA sequencing technology, including single-molecule, real-time sequencing and nanopore-based sequencing, have provided new opportunities for systematic detection of all three forms of methylated DNA at a genome-wide scale and offer unprecedented opportunities for achieving a more complete understanding of bacterial epigenomes. Indeed, as the number of mapped bacterial methylomes approaches 2,000, increasing evidence supports roles for methylation in regulation of gene expression, virulence and pathogen-host interactions.


April 21, 2020  |  

The complete genome sequence of Ethanoligenens harbinense reveals the metabolic pathway of acetate-ethanol fermentation: A novel understanding of the principles of anaerobic biotechnology.

Ethanol-type fermentation is one of three main fermentation types in the acidogenesis of anaerobic treatment systems. Non-spore-forming Ethanoligenens is as a typical genus capable of ethanol-type fermentation in mixed culture (i.e. acetate-ethanol fermentation). This genus can produce ethanol, acetate, CO2, and H2 using carbohydrates, and has application potential in anaerobic bioprocesses. Here, the complete genome sequences and methylome of Ethanoligenens harbinense strains with different autoaggregative and coaggregative abilities were obtained using the PacBio single-molecule real-time sequencing platform. The genome size of E. harbinense strains was about 2.97-3.10?Mb with 55.5% G+C content. 3020-3153 genes were annotated, most of which were methylated at specific sites or motifs. The methylation types included 6mA, 4mC, and unknown types. Comparative genomic analysis demonstrated low levels of genetic similarity between E. harbinense and other well-known hydrogen-producing bacteria (i.e., Clostridium and Thermoanaerobacter) in phylogenesis. Hydrogen production of E. harbinense was catalyzed by genes that encode [FeFe]-hydrogenases and that were synthesized by three maturases of [FeFe]-H2ase. The metabolic mechanism of H2-ethanol co-production fermentation, catalyzed by pyruvate ferredoxin oxidoreductase was proposed. This study provides genetic and evolutionary information of a model genus for the further investigation of the metabolic pathway and regulatory network of ethanol-type fermentation and anaerobic bioprocesses for waste or wastewater treatment.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020  |  

Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.

The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.