Menu
July 7, 2019

Genomes of “Spiribacter”, a streamlined, successful halophilic bacterium.

Thalassosaline waters produced by the concentration of seawater are widespread and common extreme aquatic habitats. Their salinity varies from that of sea water (ca. 3.5%) to saturation for NaCl (ca. 37%). Obviously the microbiota varies dramatically throughout this range. Recent metagenomic analysis of intermediate salinity waters (19%) indicated the presence of an abundant and yet undescribed gamma-proteobacterium. Two strains belonging to this group have been isolated from saltern ponds of intermediate salinity in two Spanish salterns and were named “Spiribacter”.The genomes of two isolates of “Spiribacter” have been fully sequenced and assembled. The analysis of metagenomic datasets indicates that microbes of this genus are widespread worldwide in medium salinity habitats representing the first ecologically defined moderate halophile. The genomes indicate that the two isolates belong to different species within the same genus. Both genomes are streamlined with high coding densities, have few regulatory mechanisms and no motility or chemotactic behavior. Metabolically they are heterotrophs with a subgroup II xanthorhodopsin as an additional energy source when light is available.This is the first bacterium that has been proven by culture independent approaches to be prevalent in hypersaline habitats of intermediate salinity (half a way between the sea and NaCl saturation). Predictions from the proteome and analysis of transporter genes, together with a complete ectoine biosynthesis gene cluster are consistent with these microbes having the salt-out-organic-compatible solutes type of osmoregulation. All these features are also consistent with a well-adapted fully planktonic microbe while other halophiles with more complex genomes such as Salinibacter ruber might have particle associated microniches.


July 7, 2019

Comparing the genomes of Helicobacter pylori clinical strain UM032 and mice-adapted derivatives.

Helicobacter pylori is a Gram-negative bacterium that persistently infects the human stomach inducing chronic inflammation. The exact mechanisms of pathogenesis are still not completely understood. Although not a natural host for H. pylori, mouse infection models play an important role in establishing the immunology and pathogenicity of H. pylori. In this study, for the first time, the genome sequences of clinical H. pylori strain UM032 and mice-adapted derivatives, 298 and 299, were sequenced using the PacBio Single Molecule, Real-Time (SMRT) technology.Here, we described the single contig which was achieved for UM032 (1,599,441 bp), 298 (1,604,216 bp) and 299 (1,601,149 bp). Preliminary analysis suggested that methylation of H. pylori genome through its restriction modification system may be determinative of its host specificity and adaptation.Availability of these genomic sequences will aid in enhancing our current level of understanding the host specificity of H. pylori.


July 7, 2019

Genome sequence of the phage-gene rich marine Phaeobacter arcticus type strain DSM 23566(T.).

Phaeobacter arcticus Zhang et al. 2008 belongs to the marine Roseobacter clade whose members are phylogenetically and physiologically diverse. In contrast to the type species of this genus, Phaeobacter gallaeciensis, which is well characterized, relatively little is known about the characteristics of P. arcticus. Here, we describe the features of this organism including the annotated high-quality draft genome sequence and highlight some particular traits. The 5,049,232 bp long genome with its 4,828 protein-coding and 81 RNA genes consists of one chromosome and five extrachromosomal elements. Prophage sequences identified via PHAST constitute nearly 5% of the bacterial chromosome and included a potential Mu-like phage as well as a gene-transfer agent (GTA). In addition, the genome of strain DSM 23566(T) encodes all of the genes necessary for assimilatory nitrate reduction. Phylogenetic analysis and intergenomic distances indicate that the classification of the species might need to be reconsidered.


July 7, 2019

The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production.


July 7, 2019

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. “provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.


July 7, 2019

Cerulean: A hybrid assembly using high throughput short and long reads

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.


July 7, 2019

Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans.

Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.


July 7, 2019

Use of four next-generation sequencing platforms to determine HIV-1 coreceptor tropism.

HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.


July 7, 2019

Structure of the type IV secretion system in different strains of Anaplasma phagocytophilum.

Anaplasma phagocytophilum is an intracellular organism in the Order Rickettsiales that infects diverse animal species and is causing an emerging disease in humans, dogs and horses. Different strains have very different cell tropisms and virulence. For example, in the U.S., strains have been described that infect ruminants but not dogs or rodents. An intriguing question is how the strains of A. phagocytophilum differ and what different genome loci are involved in cell tropisms and/or virulence. Type IV secretion systems (T4SS) are responsible for translocation of substrates across the cell membrane by mechanisms that require contact with the recipient cell. They are especially important in organisms such as the Rickettsiales which require T4SS to aid colonization and survival within both mammalian and tick vector cells. We determined the structure of the T4SS in 7 strains from the U.S. and Europe and revised the sequence of the repetitive virB6 locus of the human HZ strain.Although in all strains the T4SS conforms to the previously described split loci for vir genes, there is great diversity within these loci among strains. This is particularly evident in the virB2 and virB6 which are postulated to encode the secretion channel and proteins exposed on the bacterial surface. VirB6-4 has an unusual highly repetitive structure and can have a molecular weight greater than 500,000. For many of the virs, phylogenetic trees position A. phagocytophilum strains infecting ruminants in the U.S. and Europe distant from strains infecting humans and dogs in the U.S.Our study reveals evidence of gene duplication and considerable diversity of T4SS components in strains infecting different animals. The diversity in virB2 is in both the total number of copies, which varied from 8 to 15 in the herein characterized strains, and in the sequence of each copy. The diversity in virB6 is in the sequence of each of the 4 copies in the single locus and the presence of varying numbers of repetitive units in virB6-3 and virB6-4. These data suggest that the T4SS should be investigated further for a potential role in strain virulence of A. phagocytophilum.


July 7, 2019

A hybrid approach for the automated finishing of bacterial genomes.

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.


July 7, 2019

Cancer genomics: technology, discovery, and translation.

In recent years, the increasing awareness that somatic mutations and other genetic aberrations drive human malignancies has led us within reach of personalized cancer medicine (PCM). The implementation of PCM is based on the following premises: genetic aberrations exist in human malignancies; a subset of these aberrations drive oncogenesis and tumor biology; these aberrations are actionable (defined as having the potential to affect management recommendations based on diagnostic, prognostic, and/or predictive implications); and there are highly specific anticancer agents available that effectively modulate these targets. This article highlights the technology underlying cancer genomics and examines the early results of genome sequencing and the challenges met in the discovery of new genetic aberrations. Finally, drawing from experiences gained in a feasibility study of somatic mutation genotyping and targeted exome sequencing led by Princess Margaret Hospital-University Health Network and the Ontario Institute for Cancer Research, the processes, challenges, and issues involved in the translation of cancer genomics to the clinic are discussed.


July 7, 2019

Complete genome sequence of Liberibacter crescens BT-1.

Liberibacter crescens BT-1, a Gram-negative, rod-shaped bacterial isolate, was previously recovered from mountain papaya to gain insight on Huanglongbing (HLB) and Zebra Chip (ZC) diseases. The genome of BT-1 was sequenced at the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. A finished assembly and annotation yielded one chromosome with a length of 1,504,659 bp and a G+C content of 35.4%. Comparison to other species in the Liberibacter genus, L. crescens has many more genes in thiamine and essential amino acid biosynthesis. This likely explains why L. crescens BT-1 is culturable while the known Liberibacter strains have not yet been cultured. Similar to Candidatus L. asiaticus psy62, the L. crescens BT-1 genome contains two prophage regions.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.