Plasmodium knowlesi, a common parasite of macaques, is recognised as a significant cause of human malaria in Malaysia. The P. knowlesi A1H1 line has been adapted to continuous culture in human erythrocytes, successfully providing an in vitro model to study the parasite. We have assembled a reference genome for the PkA1-H.1 line using PacBio long read combined with Illumina short read sequence data. Compared with the H-strain reference, the new reference has improved genome coverage and a novel description of methylation sites. The PkA1-H.1 reference will enhance the capabilities of the in vitro model to improve the understanding of P. knowlesi infection in humans. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
A sub-population of group A Streptococcus elicits a population-wide production of bacteriocins to establish dominance in the host.
Bacteria use quorum sensing (QS) to regulate gene expression. We identified a group A Streptococcus (GAS) strain possessing the QS system sil, which produces functional bacteriocins, through a sequential signaling pathway integrating host and bacterial signals. Host cells infected by GAS release asparagine (ASN), which is sensed by the bacteria to alter its gene expression and rate of proliferation. We show that upon ASN sensing, GAS upregulates expression of the QS autoinducer peptide SilCR. Initial SilCR expression activates the autoinduction cycle for further SilCR production. The autoinduction process propagates throughout the GAS population, resulting in bacteriocin production. Subcutaneous co-injection of mice with a bacteriocin-producing strain and the globally disseminated M1T1 GAS clone results in M1T1 killing within soft tissue. Thus, by sensing host signals, a fraction of a bacterial population can trigger an autoinduction mechanism mediated by QS, which acts on the entire bacterial community to outcompete other bacteria within the infection. Copyright © 2018 Elsevier Inc. All rights reserved.
Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi.
Orientia tsutsugamushi is a clinically important but neglected obligate intracellular bacterial pathogen of the Rickettsiaceae family that causes the potentially life-threatening human disease scrub typhus. In contrast to the genome reduction seen in many obligate intracellular bacteria, early genetic studies of Orientia have revealed one of the most repetitive bacterial genomes sequenced to date. The dramatic expansion of mobile elements has hampered efforts to generate complete genome sequences using short read sequencing methodologies, and consequently there have been few studies of the comparative genomics of this neglected species.We report new high-quality genomes of O. tsutsugamushi, generated using PacBio single molecule long read sequencing, for six strains: Karp, Kato, Gilliam, TA686, UT76 and UT176. In comparative genomics analyses of these strains together with existing reference genomes from Ikeda and Boryong strains, we identify a relatively small core genome of 657 genes, grouped into core gene islands and separated by repeat regions, and use the core genes to infer the first whole-genome phylogeny of Orientia.Complete assemblies of multiple Orientia genomes verify initial suggestions that these are remarkable organisms. They have larger genomes compared with most other Rickettsiaceae, with widespread amplification of repeat elements and massive chromosomal rearrangements between strains. At the gene level, Orientia has a relatively small set of universally conserved genes, similar to other obligate intracellular bacteria, and the relative expansion in genome size can be accounted for by gene duplication and repeat amplification. Our study demonstrates the utility of long read sequencing to investigate complex bacterial genomes and characterise genomic variation.
Comprehensive mutagenesis of the fimS promoter regulatory switch reveals novel regulation of type 1 pili in uropathogenic Escherichia coli.
Type 1 pili (T1P) are major virulence factors for uropathogenic Escherichia coli (UPEC), which cause both acute and recurrent urinary tract infections. T1P expression therefore is of direct relevance for disease. T1P are phase variable (both piliated and nonpiliated bacteria exist in a clonal population) and are controlled by an invertible DNA switch (fimS), which contains the promoter for the fim operon encoding T1P. Inversion of fimS is stochastic but may be biased by environmental conditions and other signals that ultimately converge at fimS itself. Previous studies of fimS sequences important for T1P phase variation have focused on laboratory-adapted E. coli strains and have been limited in the number of mutations or by alteration of the fimS genomic context. We surmounted these limitations by using saturating genomic mutagenesis of fimS coupled with accurate sequencing to detect both mutations and phase status simultaneously. In addition to the sequences known to be important for biasing fimS inversion, our method also identifies a previously unknown pair of 5′ UTR inverted repeats that act by altering the relative fimA levels to control phase variation. Thus we have uncovered an additional layer of T1P regulation potentially impacting virulence and the coordinate expression of multiple pilus systems.
Methylation in Mycobacterium tuberculosis is lineage specific with associated mutations present globally.
DNA methylation is an epigenetic modification of the genome involved in regulating crucial cellular processes, including transcription and chromosome stability. Advances in PacBio sequencing technologies can be used to robustly reveal methylation sites. The methylome of the Mycobacterium tuberculosis complex is poorly understood but may be involved in virulence, hypoxic survival and the emergence of drug resistance. In the most extensive study to date, we characterise the methylome across the 4 major lineages of M. tuberculosis and 2 lineages of M. africanum, the leading causes of tuberculosis disease in humans. We reveal lineage-specific methylated motifs and strain-specific mutations that are abundant globally and likely to explain loss of function in the respective methyltransferases. Our work provides a set of sixteen new complete reference genomes for the Mycobacterium tuberculosis complex, including complete lineage 5 genomes. Insights into lineage-specific methylomes will further elucidate underlying biological mechanisms and other important phenotypes of the epi-genome.
Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development.
Malaria infection during pregnancy, caused by the sequestering of Plasmodium falciparum parasites in the placenta, leads to high infant mortality and maternal morbidity. The parasite-placenta adherence mechanism is mediated by the VAR2CSA protein, a target for natural occurring immunity. Currently, vaccine development is based on its ID1-DBL2Xb domain however little is known about the global genetic diversity of the encoding var2csa gene, which could influence vaccine efficacy. In a comprehensive analysis of the var2csa gene in >2,000?P. falciparum field isolates across 23 countries, we found that var2csa is duplicated in high prevalence (>25%), African and Oceanian populations harbour a much higher diversity than other regions, and that insertions/deletions are abundant leading to an underestimation of the diversity of the locus. Further, ID1-DBL2Xb haplotypes associated with adverse birth outcomes are present globally, and African-specific haplotypes exist, which should be incorporated into vaccine design.
Escherichia coli represents the primary etiological agent responsible for urinary tract infections, one of the most common infections in humans. We report here the complete genome sequence of uropathogenic Escherichia coli strain CI5, a clinical pyelonephritis isolate used for studying pathogenesis. Copyright © 2015 Mehershahi et al.
Complete genome sequence of Streptococcus agalactiae serotype III, multilocus sequence type 283 strain SG-M1.
Streptococcus agalactiae (group B Streptococcus) is a common commensal strain in the human gastrointestinal tract that can also cause invasive disease in humans and other animals. We report here the complete genome sequence of S. agalactiae SG-M1, a serotype III, multilocus sequence type 283 strain, isolated from a Singaporean patient suffering from meningitis. Copyright © 2015 Mehershahi et al.
Escherichia coli is the most well-studied bacterium and a common colonizer of the lower mammalian gastrointestinal tract. We report here the complete genome sequence of the original Escherichia coli isolate, strain NCTC86, which was described by Theodor Escherich, for whom the genus is named. Copyright © 2017 Khetrapal et al.
Escherichia coli is the most common bacterium causing urinary tract infections in humans. We report here the complete genome sequence of the uropathogenic Escherichia coli strain NU14, a clinical pyelonephritis isolate used for studying pathogenesis. Copyright © 2017 Mehershahi and Chen.
2015 epidemic of severe Streptococcus agalactiae sequence type 283 infections in Singapore associated with the consumption of raw freshwater fish: a detailed analysis of clinical, epidemiological, and bacterial sequencing data.
Streptococcus agalactiae (group B Streptococcus [GBS]) has not been described as a foodborne pathogen. However, in 2015, a large outbreak of severe invasive sequence type (ST) 283 GBS infections in adults epidemiologically linked to the consumption of raw freshwater fish occurred in Singapore. We attempted to determine the scale of the outbreak, define the clinical spectrum of disease, and link the outbreak to contaminated fish.Time-series analysis was performed on microbiology laboratory data. Food handlers and fishmongers were screened for enteric carriage of GBS. A retrospective cohort study was conducted to assess differences in demographic and clinical characteristics of patients with invasive ST283 and non-ST283 infections. Whole-genome sequencing was performed on human and fish ST283 isolates from Singapore, Thailand, and Hong Kong.The outbreak was estimated to have started in late January 2015. Within the study cohort of 408 patients, ST283 accounted for 35.8% of cases. Patients with ST283 infection were younger and had fewer comorbidities but were more likely to develop meningoencephalitis, septic arthritis, and spinal infection. Of 82 food handlers and fishmongers screened, none carried ST283. Culture of 43 fish samples yielded 13 ST283-positive samples. Phylogenomic analysis of 161 ST283 isolates from humans and fish revealed they formed a tight clade distinguished by 93 single-nucleotide polymorphisms.ST283 is a zoonotic GBS clone associated with farmed freshwater fish, capable of causing severe disease in humans. It caused a large foodborne outbreak in Singapore and poses both a regional and potentially more widespread threat.
The mycalesine butterfly Bicyclus anynana , the ‘Squinting bush brown’, is a model organism in the study of lepidopteran ecology, development and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species.Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology. 128 Gb raw Illumina data were filtered to 124 Gb and assembled to a final size of 475 Mb (~260X assembly coverage). Contigs were scaffolded using mate-pair, transcriptome and PacBio data into 10,800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements, and encodes a total of 22,642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes.We report a high-quality draft genome sequence for Bicyclus anynana . The genome assembly and annotated gene models are available at LepBase ( http://ensembl.lepbase.org/index.html ).
Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis.
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Single-virion sequencing of lamivudine-treated HBV populations reveal population evolution dynamics and demographic history.
Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients – once before antiviral treatment and once after viral rebound due to resistance.With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing.Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.
OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.
The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies.