Menu
July 19, 2019

Completing bacterial genome assemblies: strategy and performance comparisons.

Determining the genomic sequences of microorganisms is the basis and prerequisite for understanding their biology and functional characterization. While the advent of low-cost, extremely high-throughput second-generation sequencing technologies and the parallel development of assembly algorithms have generated rapid and cost-effective genome assemblies, such assemblies are often unfinished, fragmented draft genomes as a result of short read lengths and long repeats present in multiple copies. Third-generation, PacBio sequencing technologies circumvented this problem by greatly increasing read length. Hybrid approaches including ALLPATHS-LG, PacBio corrected reads pipeline, SPAdes, and SSPACE-LongRead, and non-hybrid approaches-hierarchical genome-assembly process (HGAP) and PacBio corrected reads pipeline via self-correction-have therefore been proposed to utilize the PacBio long reads that can span many thousands of bases to facilitate the assembly of complete microbial genomes. However, standardized procedures that aim at evaluating and comparing these approaches are currently insufficient. To address the issue, we herein provide a comprehensive comparison by collecting datasets for the comparative assessment on the above-mentioned five assemblers. In addition to offering explicit and beneficial recommendations to practitioners, this study aims to aid in the design of a paradigm positioned to complete bacterial genome assembly.


July 19, 2019

Complete genome sequence and analysis of Lactobacillus hokkaidonensis LOOC260(T), a psychrotrophic lactic acid bacterium isolated from silage.

Lactobacillus hokkaidonensis is an obligate heterofermentative lactic acid bacterium, which is isolated from Timothy grass silage in Hokkaido, a subarctic region of Japan. This bacterium is expected to be useful as a silage starter culture in cold regions because of its remarkable psychrotolerance; it can grow at temperatures as low as 4°C. To elucidate its genetic background, particularly in relation to the source of psychrotolerance, we constructed the complete genome sequence of L. hokkaidonensis LOOC260(T) using PacBio single-molecule real-time sequencing technology.The genome of LOOC260(T) comprises one circular chromosome (2.28 Mbp) and two circular plasmids: pLOOC260-1 (81.6 kbp) and pLOOC260-2 (41.0 kbp). We identified diverse mobile genetic elements, such as prophages, integrated and conjugative elements, and conjugative plasmids, which may reflect adaptation to plant-associated niches. Comparative genome analysis also detected unique genomic features, such as genes involved in pentose assimilation and NADPH generation.This is the first complete genome in the L. vaccinostercus group, which is poorly characterized, so the genomic information obtained in this study provides insight into the genetics and evolution of this group. We also found several factors that may contribute to the ability of L. hokkaidonensis to grow at cold temperatures. The results of this study will facilitate further investigation for the cold-tolerance mechanism of L. hokkaidonensis.


July 19, 2019

Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006.

Restriction-modification (RM) systems have evolved to protect the cell from invading DNAs and are composed of two enzymes: a DNA methyltransferase and a restriction endonuclease. Although RM systems are present in both archaeal and bacterial genomes, DNA methylation in archaea has not been well defined. In order to characterize the function of RM systems in archaeal species, we have made use of the model haloarchaeon Haloferax volcanii. A genomic DNA methylation analysis of H. volcanii strain H26 was performed using PacBio single molecule real-time (SMRT) sequencing. This analysis was also performed on a strain of H. volcanii in which an annotated DNA methyltransferase gene HVO_A0006 was deleted from the genome. Sequence analysis of H26 revealed two motifs which are modified in the genome: C(m4)TAG and GCA(m6)BN6VTGC. Analysis of the ?HVO_A0006 strain indicated that it exhibited reduced adenine methylation compared to the parental strain and altered the detected adenine motif. However, protein domain architecture analysis and amino acid alignments revealed that HVO_A0006 is homologous only to the N-terminal endonuclease region of Type IIG RM proteins and contains a PD-(D/E)XK nuclease motif, suggesting that HVO_A0006 is a PD-(D/E)XK nuclease family protein. Further bioinformatic analysis of the HVO_A0006 gene demonstrated that the gene is rare among the Halobacteria. It is surrounded by two transposition genes suggesting that HVO_A0006 is a fragment of a Type IIG RM gene, which has likely been acquired through gene transfer, and affects restriction-modification activity by interacting with another RM system component(s). Here, we present the first genome-wide characterization of DNA methylation in an archaeal species and examine the function of a DNA methyltransferase related gene HVO_A0006.


July 19, 2019

Complete nucleotide sequences of bla(CTX-M)-harboring IncF plasmids from community-associated Escherichia coli strains in the United States.

Community-associated infections due to Escherichia coli producing CTX-M-type extended-spectrum ß-lactamases are increasingly recognized in the United States. The bla(CTX-M) genes are frequently carried on IncF group plasmids. In this study, bla(CTX-M-15)-harboring plasmids pCA14 (sequence type 131 [ST131]) and pCA28 (ST44) and bla(CTX-M-14)-harboring plasmid pCA08 (ST131) were sequenced and characterized. The three plasmids were closely related to other IncFII plasmids from continents outside the United States in the conserved backbone region and multiresistance regions (MRRs). Each of the bla(CTX-M-15)-carrying plasmids pCA14 and pCA28 belonged to F31:A4:B1 (FAB [FII, FIA, FIB] formula) and showed a high level of similarity (92% coverage of pCA14 and 99% to 100% nucleotide identity), suggesting a possible common origin. The blaC(TX-M-14)-carrying plasmid pCA08 belonged to F2:A2:B20 and was highly similar to pKF3-140 from China (88% coverage of pCA08 and 99% to 100% nucleotide identity). All three plasmids carried multiple antimicrobial resistance genes and modules associated with virulence and biochemical pathways, which likely confer selective advantages for their host strains. The bla(CTX-M)-carrying IncFII-IA-IB plasmids implicated in community-associated infections in the United States shared key structural features with those identified from other continents, underscoring the global nature of this plasmid epidemic. Copyright © 2015, American Society for Microbiology. All Rights Reserved.


July 19, 2019

Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies.

During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20?kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.


July 19, 2019

Genome-wide methylation patterns in Salmonella enterica subsp. enterica serovars.

The methylation of DNA bases plays an important role in numerous biological processes including development, gene expression, and DNA replication. Salmonella is an important foodborne pathogen, and methylation in Salmonella is implicated in virulence. Using single molecule real-time (SMRT) DNA-sequencing, we sequenced and assembled the complete genomes of eleven Salmonella enterica isolates from nine different serovars, and analysed the whole-genome methylation patterns of each genome. We describe 16 distinct N6-methyladenine (m6A) methylated motifs, one N4-methylcytosine (m4C) motif, and one combined m6A-m4C motif. Eight of these motifs are novel, i.e., they have not been previously described. We also identified the methyltransferases (MTases) associated with 13 of the motifs. Some motifs are conserved across all Salmonella serovars tested, while others were found only in a subset of serovars. Eight of the nine serovars contained a unique methylated motif that was not found in any other serovar (most of these motifs were part of Type I restriction modification systems), indicating the high diversity of methylation patterns present in Salmonella.


July 19, 2019

Targeted single molecule sequencing methodology for ovarian hyperstimulation syndrome.

One of the most significant issues surrounding next generation sequencing is the cost and the difficulty assembling short read lengths. Targeted capture enrichment of longer fragments using single molecule sequencing (SMS) is expected to improve both sequence assembly and base-call accuracy but, at present, there are very few examples of successful application of these technologic advances in translational research and clinical testing. We developed a targeted single molecule sequencing (T-SMS) panel for genes implicated in ovarian response to controlled ovarian hyperstimulation (COH) for infertility.Target enrichment was carried out using droplet-base multiplex polymerase chain reaction (PCR) technology (RainDance®) designed to yield amplicons averaging 1 kb fragment size from candidate 44 loci (99.8% unique base-pair coverage). The total targeted sequence was 3.18 Mb per sample. SMS was carried out using single molecule, real-time DNA sequencing (SMRT® Pacific Biosciences®), average raw read length?=?1178 nucleotides, 5% of the amplicons >6000 nucleotides). After filtering with circular consensus (CCS) reads, the mean read length was 3200 nucleotides (97% CCS accuracy). Primary data analyses, alignment and filtering utilized the Pacific Biosciences® SMRT portal. Secondary analysis was conducted using the Genome Analysis Toolkit for SNP discovery l and wANNOVAR for functional analysis of variants. Filtered functional variants 18 of 19 (94.7%) were further confirmed using conventional Sanger sequencing. CCS reads were able to accurately detect zygosity. Coverage within GC rich regions (i.e.VEGFR; 72% GC rich) was achieved by capturing long genomic DNA (gDNA) fragments and reading into regions that flank the capture regions. As proof of concept, a non-synonymous LHCGR variant captured in two severe OHSS cases, and verified by conventional sequencing.Combining emulsion PCR-generated 1 kb amplicons and SMRT DNA sequencing permitted greater depth of coverage for T-SMS and facilitated easier sequence assembly. To the best of our knowledge, this is the first report combining emulsion PCR and T-SMS for long reads using human DNA samples, and NGS panel designed for biomarker discovery in OHSS.


July 19, 2019

Specificity of the ModA11, ModA12 and ModD1 epigenetic regulator N6-adenine DNA methyltransferases of Neisseria meningitidis.

Phase variation (random ON/OFF switching) of gene expression is a common feature of host-adapted pathogenic bacteria. Phase variably expressed N(6)-adenine DNA methyltransferases (Mod) alter global methylation patterns resulting in changes in gene expression. These systems constitute phase variable regulons called phasevarions. Neisseria meningitidis phasevarions regulate genes including virulence factors and vaccine candidates, and alter phenotypes including antibiotic resistance. The target site recognized by these Type III N(6)-adenine DNA methyltransferases is not known. Single molecule, real-time (SMRT) methylome analysis was used to identify the recognition site for three key N. meningitidis methyltransferases: ModA11 (exemplified by M.NmeMC58I) (5′-CGY M6A: G-3′), ModA12 (exemplified by M.Nme77I, M.Nme18I and M.Nme579II) (5′-AC M6A: CC-3′) and ModD1 (exemplified by M.Nme579I) (5′-CC M6A: GC-3′). Restriction inhibition assays and mutagenesis confirmed the SMRT methylome analysis. The ModA11 site is complex and atypical and is dependent on the type of pyrimidine at the central position, in combination with the bases flanking the core recognition sequence 5′-CGY M6A: G-3′. The observed efficiency of methylation in the modA11 strain (MC58) genome ranged from 4.6% at 5′-GCGC M6A: GG-3′ sites, to 100% at 5′-ACGT M6A: GG-3′ sites. Analysis of the distribution of modified sites in the respective genomes shows many cases of association with intergenic regions of genes with altered expression due to phasevarion switching. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 19, 2019

Genome modification in Enterococcus faecalis OG1RF assessed by bisulfite sequencing and Single-Molecule Real-Time Sequencing.

Enterococcus faecalis is a Gram-positive bacterium that natively colonizes the human gastrointestinal tract and opportunistically causes life-threatening infections. Multidrug-resistant (MDR) E. faecalis strains have emerged, reducing treatment options for these infections. MDR E. faecalis strains have large genomes containing mobile genetic elements (MGEs) that harbor genes for antibiotic resistance and virulence determinants. Bacteria commonly possess genome defense mechanisms to block MGE acquisition, and we hypothesize that these mechanisms have been compromised in MDR E. faecalis. In restriction-modification (R-M) defense, the bacterial genome is methylated at cytosine (C) or adenine (A) residues by a methyltransferase (MTase), such that nonself DNA can be distinguished from self DNA. A cognate restriction endonuclease digests improperly modified nonself DNA. Little is known about R-M in E. faecalis. Here, we use genome resequencing to identify DNA modifications occurring in the oral isolate OG1RF. OG1RF has one of the smallest E. faecalis genomes sequenced to date and possesses few MGEs. Single-molecule real-time (SMRT) and bisulfite sequencing revealed that OG1RF has global 5-methylcytosine (m5C) methylation at 5′-GCWGC-3′ motifs. A type II R-M system confers the m5C modification, and disruption of this system impacts OG1RF electrotransformability and conjugative transfer of an antibiotic resistance plasmid. A second DNA MTase was poorly expressed under laboratory conditions but conferred global N(4)-methylcytosine (m4C) methylation at 5′-CCGG-3′ motifs when expressed in Escherichia coli. Based on our results, we conclude that R-M can act as a barrier to MGE acquisition and likely influences antibiotic resistance gene dissemination in the E. faecalis species.The horizontal transfer of antibiotic resistance genes among bacteria is a critical public health concern. Enterococcus faecalis is an opportunistic pathogen that causes life-threatening infections in humans. Multidrug resistance acquired by horizontal gene transfer limits treatment options for these infections. In this study, we used innovative DNA sequencing methodologies to investigate how a model strain of E. faecalis discriminates its own DNA from foreign DNA, i.e., self versus nonself discrimination. We also assess the role of an E. faecalis genome modification system in modulating conjugative transfer of an antibiotic resistance plasmid. These results are significant because they demonstrate that differential genome modification impacts horizontal gene transfer frequencies in E. faecalis. Copyright © 2015, American Society for Microbiology. All Rights Reserved.


July 19, 2019

CGGBP1 mitigates cytosine methylation at repetitive DNA sequences.

CGGBP1 is a repetitive DNA-binding transcription regulator with target sites at CpG-rich sequences such as CGG repeats and Alu-SINEs and L1-LINEs. The role of CGGBP1 as a possible mediator of CpG methylation however remains unknown. At CpG-rich sequences cytosine methylation is a major mechanism of transcriptional repression. Concordantly, gene-rich regions typically carry lower levels of CpG methylation than the repetitive elements. It is well known that at interspersed repeats Alu-SINEs and L1-LINEs high levels of CpG methylation constitute a transcriptional silencing and retrotransposon inactivating mechanism.Here, we have studied genome-wide CpG methylation with or without CGGBP1-depletion. By high throughput sequencing of bisulfite-treated genomic DNA we have identified CGGBP1 to be a negative regulator of CpG methylation at repetitive DNA sequences. In addition, we have studied CpG methylation alterations on Alu and L1 retrotransposons in CGGBP1-depleted cells using a novel bisulfite-treatment and high throughput sequencing approach.The results clearly show that CGGBP1 is a possible bidirectional regulator of CpG methylation at Alus, and acts as a repressor of methylation at L1 retrotransposons.


July 19, 2019

HLA typing for the next generation.

Allele-level resolution data at primary HLA typing is the ideal for most histocompatibility testing laboratories. Many high-throughput molecular HLA typing approaches are unable to determine the phase of observed DNA sequence polymorphisms, leading to ambiguous results. The use of higher resolution methods is often restricted due to cost and time limitations. Here we report on the feasibility of using Pacific Biosciences’ Single Molecule Real-Time (SMRT) DNA sequencing technology for high-resolution and high-throughput HLA typing. Seven DNA samples were typed for HLA-A, -B and -C. The results showed that SMRT DNA sequencing technology was able to generate sequences that spanned entire HLA Class I genes that allowed for accurate allele calling. Eight novel genomic HLA class I sequences were identified, four were novel alleles, three were confirmed as genomic sequence extensions and one corrected an existing genomic reference sequence. This method has the potential to revolutionize the field of HLA typing. The clinical impact of achieving this level of resolution HLA typing data is likely to considerable, particularly in applications such as organ and blood stem cell transplantation where matching donors and recipients for their HLA is of utmost importance.


July 19, 2019

Identification of a common risk haplotype for canine idiopathic epilepsy in the ADAM23 gene.

Idiopathic epilepsy is a common neurological disease in human and domestic dogs but relatively few risk genes have been identified to date. The seizure characteristics, including focal and generalised seizures, are similar between the two species, with gene discovery facilitated by the reduced genetic heterogeneity of purebred dogs. We have recently identified a risk locus for idiopathic epilepsy in the Belgian Shepherd breed on a 4.4 megabase region on CFA37.We have expanded a previous study replicating the association with a combined analysis of 157 cases and 179 controls in three additional breeds: Schipperke, Finnish Spitz and Beagle (pc?=?2.9e-07, pGWAS?=?1.74E-02). A targeted resequencing of the 4.4 megabase region in twelve Belgian Shepherd cases and twelve controls with opposite haplotypes identified 37 case-specific variants within the ADAM23 gene. Twenty-seven variants were validated in 285 cases and 355 controls from four breeds, resulting in a strong replication of the ADAM23 locus (praw?=?2.76e-15) and the identification of a common 28 kb-risk haplotype in all four breeds. Risk haplotype was present in frequencies of 0.49-0.7 in the breeds, suggesting that ADAM23 is a low penetrance risk gene for canine epilepsy.These results implicate ADAM23 in common canine idiopathic epilepsy, although the causative variant remains yet to be identified. ADAM23 plays a role in synaptic transmission and interacts with known epilepsy genes, LGI1 and LGI2, and should be considered as a candidate gene for human epilepsies.


July 19, 2019

Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

Single Molecule, Real-Time (SMRT(®)) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 19, 2019

Characterizing and overriding the structural mechanism of the Quizartinib-resistant FLT3 “gatekeeper” F691L mutation with PLX3397.

Tyrosine kinase domain mutations are a common cause of acquired clinical resistance to tyrosine kinase inhibitors (TKI) used to treat cancer, including the FLT3 inhibitor quizartinib. Mutation of kinase “gatekeeper” residues, which control access to an allosteric pocket adjacent to the ATP-binding site, has been frequently implicated in TKI resistance. The molecular underpinnings of gatekeeper mutation-mediated resistance are incompletely understood. We report the first cocrystal structure of FLT3 with the TKI quizartinib, which demonstrates that quizartinib binding relies on essential edge-to-face aromatic interactions with the gatekeeper F691 residue, and F830 within the highly conserved Asp-Phe-Gly motif in the activation loop. This reliance makes quizartinib critically vulnerable to gatekeeper and activation loop substitutions while minimizing the impact of mutations elsewhere. Moreover, we identify PLX3397, a novel FLT3 inhibitor that retains activity against the F691L mutant due to a binding mode that depends less vitally on specific interactions with the gatekeeper position.We report the first cocrystal structure of FLT3 with a kinase inhibitor, elucidating the structural mechanism of resistance due to the gatekeeper F691L mutation. PLX3397 is a novel FLT3 inhibitor with in vitro activity against this mutation but is vulnerable to kinase domain mutations in the FLT3 activation loop. Cancer Discov; 5(6); 668-79. ©2015 AACR. This article is highlighted in the In This Issue feature, p. 565. ©2015 American Association for Cancer Research.


July 19, 2019

Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes.

Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these ‘hotspot’ sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.