Menu
July 19, 2019

Genome sequencing and comparative genomics provides insights on the evolutionary dynamics and pathogenic potential of different H-serotypes of Shiga toxin-producing Escherichia coli O104.

Various H-serotypes of the Shiga toxin-producing Escherichia coli (STEC) O104, including H4, H7, H21, and H¯, have been associated with sporadic cases of illness and have caused food-borne outbreaks globally. In the U.S., STEC O104:H21 caused an outbreak associated with milk in 1994. However, there is little known on the evolutionary origins of STEC O104 strains, and how genotypic diversity contributes to pathogenic potential of various O104 H-antigen serotypes isolated from different ecological niches and/or geographical regions.Two STEC O104:H21 (milk outbreak strain) and O104:H7 (cattle isolate) strains were shot-gun sequenced, and the genomes were closed. The intimin (eae) gene, involved in the attaching-effacing phenotype of diarrheagenic E. coli, was not found in either strain. Examining various O104 genome sequences, we found that two “complete” left and right end portions of the locus of enterocyte effacement (LEE) pathogenicity island were present in 13 O104 strains; however, the central portion of LEE was missing, where the eae gene is located. In O104:H4 strains, the missing central portion of the LEE locus was replaced by a pathogenicity island carrying the aidA (adhesin involved in diffuse adherence) gene and antibiotic resistance genes commonly carried on plasmids. Enteroaggregative E. coli-specific virulence genes and European outbreak O104:H4-specific stx2-encoding Escherichia P13374 or Escherichia TL-2011c bacteriophages were missing in some of the O104:H4 genome sequences available from public databases. Most of the genomic variations in the strains examined were due to the presence of different mobile genetic elements, including prophages and genomic island regions. The presence of plasmids carrying virulence-associated genes may play a role in the pathogenic potential of O104 strains.The two strains sequenced in this study (O104:H21 and O104:H7) are genetically more similar to each other than to the O104:H4 strains that caused an outbreak in Germany in 2011 and strains found in Central Africa. A hypothesis on strain evolution and pathogenic potential of various H-serotypes of E. coli O104 strains is proposed.


July 19, 2019

Complete genome sequence and analysis of Lactobacillus hokkaidonensis LOOC260(T), a psychrotrophic lactic acid bacterium isolated from silage.

Lactobacillus hokkaidonensis is an obligate heterofermentative lactic acid bacterium, which is isolated from Timothy grass silage in Hokkaido, a subarctic region of Japan. This bacterium is expected to be useful as a silage starter culture in cold regions because of its remarkable psychrotolerance; it can grow at temperatures as low as 4°C. To elucidate its genetic background, particularly in relation to the source of psychrotolerance, we constructed the complete genome sequence of L. hokkaidonensis LOOC260(T) using PacBio single-molecule real-time sequencing technology.The genome of LOOC260(T) comprises one circular chromosome (2.28 Mbp) and two circular plasmids: pLOOC260-1 (81.6 kbp) and pLOOC260-2 (41.0 kbp). We identified diverse mobile genetic elements, such as prophages, integrated and conjugative elements, and conjugative plasmids, which may reflect adaptation to plant-associated niches. Comparative genome analysis also detected unique genomic features, such as genes involved in pentose assimilation and NADPH generation.This is the first complete genome in the L. vaccinostercus group, which is poorly characterized, so the genomic information obtained in this study provides insight into the genetics and evolution of this group. We also found several factors that may contribute to the ability of L. hokkaidonensis to grow at cold temperatures. The results of this study will facilitate further investigation for the cold-tolerance mechanism of L. hokkaidonensis.


July 19, 2019

PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations.

Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki-Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants.The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.


July 19, 2019

Intrahost dynamics of antiviral resistance in influenza a virus reflect complex patterns of segment linkage, reassortment, and natural selection.

Resistance following antiviral therapy is commonly observed in human influenza viruses. Although this evolutionary process is initiated within individual hosts, little is known about the pattern, dynamics, and drivers of antiviral resistance at this scale, including the role played by reassortment. In addition, the short duration of human influenza virus infections limits the available time window in which to examine intrahost evolution. Using single-molecule sequencing, we mapped, in detail, the mutational spectrum of an H3N2 influenza A virus population sampled from an immunocompromised patient who shed virus over a 21-month period. In this unique natural experiment, we were able to document the complex dynamics underlying the evolution of antiviral resistance. Individual resistance mutations appeared weeks before they became dominant, evolved independently on cocirculating lineages, led to a genome-wide reduction in genetic diversity through a selective sweep, and were placed into new combinations by reassortment. Notably, despite frequent reassortment, phylogenetic analysis also provided evidence for specific patterns of segment linkage, with a strong association between the hemagglutinin (HA)- and matrix (M)-encoding segments that matches that previously observed at the epidemiological scale. In sum, we were able to reveal, for the first time, the complex interaction between multiple evolutionary processes as they occur within an individual host.Understanding the evolutionary forces that shape the genetic diversity of influenza virus is crucial for predicting the emergence of drug-resistant strains but remains challenging because multiple processes occur concurrently. We characterized the evolution of antiviral resistance in a single persistent influenza virus infection, representing the first case in which reassortment and the complex patterns of drug resistance emergence and evolution have been determined within an individual host. Deep-sequence data from multiple time points revealed that the evolution of antiviral resistance reflects a combination of frequent mutation, natural selection, and a complex pattern of segment linkage and reassortment. In sum, these data show how immunocompromised hosts may help reveal the drivers of strain emergence. Copyright © 2015 Rogers et al.


July 19, 2019

Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006.

Restriction-modification (RM) systems have evolved to protect the cell from invading DNAs and are composed of two enzymes: a DNA methyltransferase and a restriction endonuclease. Although RM systems are present in both archaeal and bacterial genomes, DNA methylation in archaea has not been well defined. In order to characterize the function of RM systems in archaeal species, we have made use of the model haloarchaeon Haloferax volcanii. A genomic DNA methylation analysis of H. volcanii strain H26 was performed using PacBio single molecule real-time (SMRT) sequencing. This analysis was also performed on a strain of H. volcanii in which an annotated DNA methyltransferase gene HVO_A0006 was deleted from the genome. Sequence analysis of H26 revealed two motifs which are modified in the genome: C(m4)TAG and GCA(m6)BN6VTGC. Analysis of the ?HVO_A0006 strain indicated that it exhibited reduced adenine methylation compared to the parental strain and altered the detected adenine motif. However, protein domain architecture analysis and amino acid alignments revealed that HVO_A0006 is homologous only to the N-terminal endonuclease region of Type IIG RM proteins and contains a PD-(D/E)XK nuclease motif, suggesting that HVO_A0006 is a PD-(D/E)XK nuclease family protein. Further bioinformatic analysis of the HVO_A0006 gene demonstrated that the gene is rare among the Halobacteria. It is surrounded by two transposition genes suggesting that HVO_A0006 is a fragment of a Type IIG RM gene, which has likely been acquired through gene transfer, and affects restriction-modification activity by interacting with another RM system component(s). Here, we present the first genome-wide characterization of DNA methylation in an archaeal species and examine the function of a DNA methyltransferase related gene HVO_A0006.


July 19, 2019

Molecular analysis of asymptomatic bacteriuria Escherichia coli strain VR50 reveals adaptation to the urinary tract by gene acquisition.

Urinary tract infections (UTIs) are among the most common infectious diseases of humans, with Escherichia coli responsible for >80% of all cases. One extreme of UTI is asymptomatic bacteriuria (ABU), which occurs as an asymptomatic carrier state that resembles commensalism. To understand the evolution and molecular mechanisms that underpin ABU, the genome of the ABU E. coli strain VR50 was sequenced. Analysis of the complete genome indicated that it most resembles E. coli K-12, with the addition of a 94-kb genomic island (GI-VR50-pheV), eight prophages, and multiple plasmids. GI-VR50-pheV has a mosaic structure and contains genes encoding a number of UTI-associated virulence factors, namely, Afa (afimbrial adhesin), two autotransporter proteins (Ag43 and Sat), and aerobactin. We demonstrated that the presence of this island in VR50 confers its ability to colonize the murine bladder, as a VR50 mutant with GI-VR50-pheV deleted was attenuated in a mouse model of UTI in vivo. We established that Afa is the island-encoded factor responsible for this phenotype using two independent deletion (Afa operon and AfaE adhesin) mutants. E. coli VR50afa and VR50afaE displayed significantly decreased ability to adhere to human bladder epithelial cells. In the mouse model of UTI, VR50afa and VR50afaE displayed reduced bladder colonization compared to wild-type VR50, similar to the colonization level of the GI-VR50-pheV mutant. Our study suggests that E. coli VR50 is a commensal-like strain that has acquired fitness factors that facilitate colonization of the human bladder. Copyright © 2015, American Society for Microbiology. All Rights Reserved.


July 19, 2019

Complete nucleotide sequences of bla(CTX-M)-harboring IncF plasmids from community-associated Escherichia coli strains in the United States.

Community-associated infections due to Escherichia coli producing CTX-M-type extended-spectrum ß-lactamases are increasingly recognized in the United States. The bla(CTX-M) genes are frequently carried on IncF group plasmids. In this study, bla(CTX-M-15)-harboring plasmids pCA14 (sequence type 131 [ST131]) and pCA28 (ST44) and bla(CTX-M-14)-harboring plasmid pCA08 (ST131) were sequenced and characterized. The three plasmids were closely related to other IncFII plasmids from continents outside the United States in the conserved backbone region and multiresistance regions (MRRs). Each of the bla(CTX-M-15)-carrying plasmids pCA14 and pCA28 belonged to F31:A4:B1 (FAB [FII, FIA, FIB] formula) and showed a high level of similarity (92% coverage of pCA14 and 99% to 100% nucleotide identity), suggesting a possible common origin. The blaC(TX-M-14)-carrying plasmid pCA08 belonged to F2:A2:B20 and was highly similar to pKF3-140 from China (88% coverage of pCA08 and 99% to 100% nucleotide identity). All three plasmids carried multiple antimicrobial resistance genes and modules associated with virulence and biochemical pathways, which likely confer selective advantages for their host strains. The bla(CTX-M)-carrying IncFII-IA-IB plasmids implicated in community-associated infections in the United States shared key structural features with those identified from other continents, underscoring the global nature of this plasmid epidemic. Copyright © 2015, American Society for Microbiology. All Rights Reserved.


July 19, 2019

Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies.

During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20?kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.


July 19, 2019

Genome-wide methylation patterns in Salmonella enterica subsp. enterica serovars.

The methylation of DNA bases plays an important role in numerous biological processes including development, gene expression, and DNA replication. Salmonella is an important foodborne pathogen, and methylation in Salmonella is implicated in virulence. Using single molecule real-time (SMRT) DNA-sequencing, we sequenced and assembled the complete genomes of eleven Salmonella enterica isolates from nine different serovars, and analysed the whole-genome methylation patterns of each genome. We describe 16 distinct N6-methyladenine (m6A) methylated motifs, one N4-methylcytosine (m4C) motif, and one combined m6A-m4C motif. Eight of these motifs are novel, i.e., they have not been previously described. We also identified the methyltransferases (MTases) associated with 13 of the motifs. Some motifs are conserved across all Salmonella serovars tested, while others were found only in a subset of serovars. Eight of the nine serovars contained a unique methylated motif that was not found in any other serovar (most of these motifs were part of Type I restriction modification systems), indicating the high diversity of methylation patterns present in Salmonella.


July 19, 2019

Assessing structural variation in a personal genome-towards a human reference diploid genome.

Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.


July 19, 2019

Long-read single molecule sequencing to resolve tandem gene copies: The Mst77Y region on the Drosophila melanogaster Y chromosome.

The autosomal gene Mst77F of Drosophila melanogaster is essential for male fertility. In 2010, Krsticevic et al. (Genetics 184: 295-307) found 18 Y-linked copies of Mst77F (“Mst77Y”), which collectively account for 20% of the functional Mst77F-like mRNA. The Mst77Y genes were severely misassembled in the then-available genome assembly and were identified by cloning and sequencing polymerase chain reaction products. The genomic structure of the Mst77Y region and the possible existence of additional copies remained unknown. The recent publication of two long-read assemblies of D. melanogaster prompted us to reinvestigate this challenging region of the Y chromosome. We found that the Illumina Synthetic Long Reads assembly failed in the Mst77Y region, most likely because of its tandem duplication structure. The PacBio MHAP assembly of the Mst77Y region seems to be very accurate, as revealed by comparisons with the previously found Mst77Y genes, a bacterial artificial chromosome sequence, and Illumina reads of the same strain. We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length. Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction-induced artifacts. There are several identical copies of some Mst77Y genes, coincidentally bringing the total copy number to 18. Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes. Copyright © 2015 Krsticevic et al.


July 19, 2019

Targeted single molecule sequencing methodology for ovarian hyperstimulation syndrome.

One of the most significant issues surrounding next generation sequencing is the cost and the difficulty assembling short read lengths. Targeted capture enrichment of longer fragments using single molecule sequencing (SMS) is expected to improve both sequence assembly and base-call accuracy but, at present, there are very few examples of successful application of these technologic advances in translational research and clinical testing. We developed a targeted single molecule sequencing (T-SMS) panel for genes implicated in ovarian response to controlled ovarian hyperstimulation (COH) for infertility.Target enrichment was carried out using droplet-base multiplex polymerase chain reaction (PCR) technology (RainDance®) designed to yield amplicons averaging 1 kb fragment size from candidate 44 loci (99.8% unique base-pair coverage). The total targeted sequence was 3.18 Mb per sample. SMS was carried out using single molecule, real-time DNA sequencing (SMRT® Pacific Biosciences®), average raw read length?=?1178 nucleotides, 5% of the amplicons >6000 nucleotides). After filtering with circular consensus (CCS) reads, the mean read length was 3200 nucleotides (97% CCS accuracy). Primary data analyses, alignment and filtering utilized the Pacific Biosciences® SMRT portal. Secondary analysis was conducted using the Genome Analysis Toolkit for SNP discovery l and wANNOVAR for functional analysis of variants. Filtered functional variants 18 of 19 (94.7%) were further confirmed using conventional Sanger sequencing. CCS reads were able to accurately detect zygosity. Coverage within GC rich regions (i.e.VEGFR; 72% GC rich) was achieved by capturing long genomic DNA (gDNA) fragments and reading into regions that flank the capture regions. As proof of concept, a non-synonymous LHCGR variant captured in two severe OHSS cases, and verified by conventional sequencing.Combining emulsion PCR-generated 1 kb amplicons and SMRT DNA sequencing permitted greater depth of coverage for T-SMS and facilitated easier sequence assembly. To the best of our knowledge, this is the first report combining emulsion PCR and T-SMS for long reads using human DNA samples, and NGS panel designed for biomarker discovery in OHSS.


July 19, 2019

Specificity of the ModA11, ModA12 and ModD1 epigenetic regulator N6-adenine DNA methyltransferases of Neisseria meningitidis.

Phase variation (random ON/OFF switching) of gene expression is a common feature of host-adapted pathogenic bacteria. Phase variably expressed N(6)-adenine DNA methyltransferases (Mod) alter global methylation patterns resulting in changes in gene expression. These systems constitute phase variable regulons called phasevarions. Neisseria meningitidis phasevarions regulate genes including virulence factors and vaccine candidates, and alter phenotypes including antibiotic resistance. The target site recognized by these Type III N(6)-adenine DNA methyltransferases is not known. Single molecule, real-time (SMRT) methylome analysis was used to identify the recognition site for three key N. meningitidis methyltransferases: ModA11 (exemplified by M.NmeMC58I) (5′-CGY M6A: G-3′), ModA12 (exemplified by M.Nme77I, M.Nme18I and M.Nme579II) (5′-AC M6A: CC-3′) and ModD1 (exemplified by M.Nme579I) (5′-CC M6A: GC-3′). Restriction inhibition assays and mutagenesis confirmed the SMRT methylome analysis. The ModA11 site is complex and atypical and is dependent on the type of pyrimidine at the central position, in combination with the bases flanking the core recognition sequence 5′-CGY M6A: G-3′. The observed efficiency of methylation in the modA11 strain (MC58) genome ranged from 4.6% at 5′-GCGC M6A: GG-3′ sites, to 100% at 5′-ACGT M6A: GG-3′ sites. Analysis of the distribution of modified sites in the respective genomes shows many cases of association with intergenic regions of genes with altered expression due to phasevarion switching. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 19, 2019

An adenine code for DNA: A second life for N6-methyladenine.

DNA N6-methyladenine (6mA) protects against restriction enzymes in bacteria. However, isolated reports have suggested additional activities and its presence in other organisms, such as unicellular eukaryotes. New data now find that 6mA may have a gene regulatory function in green alga, worm, and fly, suggesting m6A as a potential “epigenetic” mark. Copyright © 2015 Elsevier Inc. All rights reserved.


July 19, 2019

DNA methylation on N6-adenine in C. elegans.

In mammalian cells, DNA methylation on the fifth position of cytosine (5mC) plays an important role as an epigenetic mark. However, DNA methylation was considered to be absent in C. elegans because of the lack of detectable 5mC, as well as homologs of the cytosine DNA methyltransferases. Here, using multiple approaches, we demonstrate the presence of adenine N(6)-methylation (6mA) in C. elegans DNA. We further demonstrate that this modification increases trans-generationally in a paradigm of epigenetic inheritance. Importantly, we identify a DNA demethylase, NMAD-1, and a potential DNA methyltransferase, DAMT-1, which regulate 6mA levels and crosstalk between methylations of histone H3K4 and adenines and control the epigenetic inheritance of phenotypes associated with the loss of the H3K4me2 demethylase spr-5. Together, these data identify a DNA modification in C. elegans and raise the exciting possibility that 6mA may be a carrier of heritable epigenetic information in eukaryotes. Copyright © 2015 Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.