Menu
July 7, 2019

The genome of the intracellular bacterium of the coastal bivalve, Solemya velum: a blueprint for thriving in and out of symbiosis

BACKGROUND:Symbioses between chemoautotrophic bacteria and marine invertebrates are rare examples of living systems that are virtually independent of photosynthetic primary production. These associations have evolved multiple times in marine habitats, such as deep-sea hydrothermal vents and reducing sediments, characterized by steep gradients of oxygen and reduced chemicals. Due to difficulties associated with maintaining these symbioses in the laboratory and culturing the symbiotic bacteria, studies of chemosynthetic symbioses rely heavily on culture independent methods. The symbiosis between the coastal bivalve, Solemya velum, and its intracellular symbiont is a model for chemosynthetic symbioses given its accessibility in intertidal environments and the ability to maintain it under laboratory conditions. To better understand this symbiosis, the genome of the S. velum endosymbiont was sequenced.RESULTS:Relative to the genomes of obligate symbiotic bacteria, which commonly undergo erosion and reduction, the S. velum symbiont genome was large (2.7Mb), GC-rich (51%), and contained a large number (78) of mobile genetic elements. Comparative genomics identified sets of genes specific to the chemosynthetic lifestyle and necessary to sustain the symbiosis. In addition, a number of inferred metabolic pathways and cellular processes, including heterotrophy, branched electron transport, and motility, suggested that besides the ability to function as an endosymbiont, the bacterium may have the capacity to live outside the host.CONCLUSIONS:The physiological dexterity indicated by the genome substantially improves our understanding of the genetic and metabolic capabilities of the S. velum symbiont and the breadth of niches the partners may inhabit during their lifecycle.


July 7, 2019

Diversification of bacterial genome content through distinct mechanisms over different timescales.

Bacterial populations often consist of multiple co-circulating lineages. Determining how such population structures arise requires understanding what drives bacterial diversification. Using 616 systematically sampled genomes, we show that Streptococcus pneumoniae lineages are typically characterized by combinations of infrequently transferred stable genomic islands: those moving primarily through transformation, along with integrative and conjugative elements and phage-related chromosomal islands. The only lineage containing extensive unique sequence corresponds to a set of atypical unencapsulated isolates that may represent a distinct species. However, prophage content is highly variable even within lineages, suggesting frequent horizontal transmission that would necessitate rapidly diversifying anti-phage mechanisms to prevent these viruses sweeping through populations. Correspondingly, two loci encoding Type I restriction-modification systems able to change their specificity over short timescales through intragenomic recombination are ubiquitous across the collection. Hence short-term pneumococcal variation is characterized by movement of phage and intragenomic rearrangements, with the slower transfer of stable loci distinguishing lineages.


July 7, 2019

A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: insight into the plastid evolution of basal eudicots.

BackgroundThe chloroplast genome is important for plant development and plant evolution. Nelumbo nucifera is one member of relict plants surviving from the late Cretaceous. Recently, a new sequencing platform PacBio RS II, known as `SMRT (Single Molecule, Real-Time) sequencing¿, has been developed. Using the SMRT sequencing to investigate the chloroplast genome of N. nucifera will help to elucidate the plastid evolution of basal eudicots.ResultsThe sizes of the de novo assembled complete chloroplast genome of N. nucifera were 163,307 bp, 163,747 bp and 163,600 bp with average depths of coverage of 7×, 712× and 105× sequenced by Sanger, Illumina MiSeq and PacBio RS II, respectively. The precise chloroplast genome of N. nucifera was obtained from PacBio RS II data proofread by Illumina MiSeq reads, with a quadripartite structure containing a large single copy region (91,846 bp) and a small single copy region (19,626 bp) separated by two inverted repeat regions (26,064 bp). The genome contains 113 different genes, including four distinct rRNAs, 30 distinct tRNAs and 79 distinct peptide-coding genes. A phylogenetic analysis of 133 taxa from 56 orders indicated that Nelumbo with an age of 177 million years is a sister clade to Platanus, which belongs to the basal eudicots. Basal eudicots began to emerge during the early Jurassic with estimated divergence times at 197 million years using MCMCTree. IR expansions/contractions within the basal eudicots seem to have occurred independently.ConclusionsBecause of long reads and lack of bias in coverage of AT-rich regions, PacBio RS II showed a great promise for highly accurate `finished¿ genomes, especially for a de novo assembly of genomes. N. nucifera is one member of basal eudicots, however, evolutionary analyses of IR structural variations of N. nucifera and other basal eudicots suggested that IR expansions/contractions occurred independently in these basal eudicots or were caused by independent insertions and deletions. The precise chloroplast genome of N. nucifera will present new information for structural variation of chloroplast genomes and provide new insight into the evolution of basal eudicots at the primary sequence and structural level.


July 7, 2019

Potential impact on kidney infection: a whole-genome analysis of Leptospira santarosai serovar Shermani.

Leptospira santarosai serovar Shermani is the most frequently encountered serovar, and it causes leptospirosis and tubulointerstitial nephritis in Taiwan. This study aims to complete the genome sequence of L. santarosai serovar Shermani and analyze the transcriptional responses of L. santarosai serovar Shermani to renal tubular cells. To assemble this highly repetitive genome, we combined reads that were generated from four next-generation sequencing platforms by using hybrid assembly approaches to finish two-chromosome contiguous sequences without gaps by validating the data with optical restriction maps and Sanger sequencing. Whole-genome comparison studies revealed a 28-kb region containing genes that encode transposases and hypothetical proteins in L. santarosai serovar Shermani, but this region is absent in other pathogenic Leptospira spp. We found that lipoprotein gene expression in both L. santarosai serovar Shermani and L. interrogans serovar Copenhageni were upregulated upon interaction with renal tubular cells, and LSS19962, a L. santarosai serovar Shermani-specific gene within a 28-kb region that encodes hypothetical proteins, was upregulated in L. santarosai serovar Shermani-infected renal tubular cells. Lipoprotein expression during leptospiral infection might facilitate the interactions of leptospires within kidneys. The availability of the whole-genome sequence of L. santarosai serovar Shermani would make it the first completed sequence of this species, and its comparison with that of other Leptospira spp. may provide invaluable information for further studies in leptospiral pathogenesis.


July 7, 2019

Dissemination of cephalosporin resistance genes between Escherichia coli strains from farm animals and humans by specific plasmid lineages.

Third-generation cephalosporins are a class of ß-lactam antibiotics that are often used for the treatment of human infections caused by Gram-negative bacteria, especially Escherichia coli. Worryingly, the incidence of human infections caused by third-generation cephalosporin-resistant E. coli is increasing worldwide. Recent studies have suggested that these E. coli strains, and their antibiotic resistance genes, can spread from food-producing animals, via the food-chain, to humans. However, these studies used traditional typing methods, which may not have provided sufficient resolution to reliably assess the relatedness of these strains. We therefore used whole-genome sequencing (WGS) to study the relatedness of cephalosporin-resistant E. coli from humans, chicken meat, poultry and pigs. One strain collection included pairs of human and poultry-associated strains that had previously been considered to be identical based on Multi-Locus Sequence Typing, plasmid typing and antibiotic resistance gene sequencing. The second collection included isolates from farmers and their pigs. WGS analysis revealed considerable heterogeneity between human and poultry-associated isolates. The most closely related pairs of strains from both sources carried 1263 Single-Nucleotide Polymorphisms (SNPs) per Mbp core genome. In contrast, epidemiologically linked strains from humans and pigs differed by only 1.8 SNPs per Mbp core genome. WGS-based plasmid reconstructions revealed three distinct plasmid lineages (IncI1- and IncK-type) that carried cephalosporin resistance genes of the Extended-Spectrum Beta-Lactamase (ESBL)- and AmpC-types. The plasmid backbones within each lineage were virtually identical and were shared by genetically unrelated human and animal isolates. Plasmid reconstructions from short-read sequencing data were validated by long-read DNA sequencing for two strains. Our findings failed to demonstrate evidence for recent clonal transmission of cephalosporin-resistant E. coli strains from poultry to humans, as has been suggested based on traditional, low-resolution typing methods. Instead, our data suggest that cephalosporin resistance genes are mainly disseminated in animals and humans via distinct plasmids.


July 7, 2019

Comparative genomics of the Campylobacter lari group.

The Campylobacter lari group is a phylogenetic clade within the epsilon subdivision of the Proteobacteria and is part of the thermotolerant Campylobacter spp., a division within the genus that includes the human pathogen Campylobacter jejuni. The C. lari group is currently composed of five species (C. lari, Campylobacter insulaenigrae, Campylobacter volucris, Campylobacter subantarcticus, and Campylobacter peloridis), as well as a group of strains termed the urease-positive thermophilic Campylobacter (UPTC) and other C. lari-like strains. Here we present the complete genome sequences of 11 C. lari group strains, including the five C. lari group species, four UPTC strains, and a lari-like strain isolated in this study. The genome of C. lari subsp. lari strain RM2100 was described previously. Analysis of the C. lari group genomes indicates that this group is highly related at the genome level. Furthermore, these genomes are strongly syntenic with minor rearrangements occurring only in 4 of the 12 genomes studied. The C. lari group can be bifurcated, based on the flagella and flagellar modification genes. Genomic analysis of the UPTC strains indicated that these organisms are variable but highly similar, closely related to but distinct from C. lari. Additionally, the C. lari group contains multiple genes encoding hemagglutination domain proteins, which are either contingency genes or linked to conserved contingency genes. Many of the features identified in strain RM2100, such as major deficiencies in amino acid biosynthesis and energy metabolism, are conserved across all 12 genomes, suggesting that these common features may play a role in the association of the C. lari group with coastal environments and watersheds. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2014. This work is written by US Government employees and is in the public domain in the US.


July 7, 2019

Haplotype assembly in polyploid genomes and identical by descent shared tracts.

Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction.In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data.HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/.Supplementary data are available at Bioinformatics online.


July 7, 2019

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.


July 7, 2019

Precise breakpoint localization of large genomic deletions using PacBio and Illumina next-generation sequencers.

Herein we present the applicability of single-molecule (PacBio RS) and second-generation sequencing technology (Illumina) to the characterization of large genomic deletions. By testing samples previously characterized using a Sanger approach, our methods determined that both next-generation sequencing platforms were able to identify the position of deletion breakpoints. Our results point out various advantages of next-generation sequencing platforms when characterizing genomic deletions; however, special attention must be dedicated to identical sequences flanking the breakpoints, such as poly(N) motifs.


July 7, 2019

Pandoraviruses: amoeba viruses with genomes up to 2.5 Mb reaching that of parasitic eukaryotes.

Ten years ago, the discovery of Mimivirus, a virus infecting Acanthamoeba, initiated a reappraisal of the upper limits of the viral world, both in terms of particle size (>0.7 micrometers) and genome complexity (>1000 genes), dimensions typical of parasitic bacteria. The diversity of these giant viruses (the Megaviridae) was assessed by sampling a variety of aquatic environments and their associated sediments worldwide. We report the isolation of two giant viruses, one off the coast of central Chile, the other from a freshwater pond near Melbourne (Australia), without morphological or genomic resemblance to any previously defined virus families. Their micrometer-sized ovoid particles contain DNA genomes of at least 2.5 and 1.9 megabases, respectively. These viruses are the first members of the proposed “Pandoravirus” genus, a term reflecting their lack of similarity with previously described microorganisms and the surprises expected from their future study.


July 7, 2019

A hybrid approach for the automated finishing of bacterial genomes.

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.


July 7, 2019

Next generation sequencing technologies and the changing landscape of phage genomics.

The dawn of next generation sequencing technologies has opened up exciting possibilities for whole genome sequencing of a plethora of organisms. The 2nd and 3rd generation sequencing technologies, based on cloning-free, massively parallel sequencing, have enabled the generation of a deluge of genomic sequences of both prokaryotic and eukaryotic origin in the last seven years. However, whole genome sequencing of bacterial viruses has not kept pace with this revolution, despite the fact that their genomes are orders of magnitude smaller in size compared with bacteria and other organisms. Sequencing phage genomes poses several challenges; (1) obtaining pure phage genomic material, (2) PCR amplification biases and (3) complex nature of their genetic material due to features such as methylated bases and repeats that are inherently difficult to sequence and assemble. Here we describe conclusions drawn from our efforts in sequencing hundreds of bacteriophage genomes from a variety of Gram-positive and Gram-negative bacteria using Sanger, 454, Illumina and PacBio technologies. Based on our experience we propose several general considerations regarding sample quality, the choice of technology and a “blended approach” for generating reliable whole genome sequences of phages.


July 7, 2019

An Inv(16)(p13.3q24.3)-encoded CBFA2T3-GLIS2 fusion protein defines an aggressive subtype of pediatric acute megakaryoblastic leukemia.

To define the mutation spectrum in non-Down syndrome acute megakaryoblastic leukemia (non-DS-AMKL), we performed transcriptome sequencing on diagnostic blasts from 14 pediatric patients and validated our findings in a recurrency/validation cohort consisting of 34 pediatric and 28 adult AMKL samples. Our analysis identified a cryptic chromosome 16 inversion (inv(16)(p13.3q24.3)) in 27% of pediatric cases, which encodes a CBFA2T3-GLIS2 fusion protein. Expression of CBFA2T3-GLIS2 in Drosophila and murine hematopoietic cells induced bone morphogenic protein (BMP) signaling and resulted in a marked increase in the self-renewal capacity of hematopoietic progenitors. These data suggest that expression of CBFA2T3-GLIS2 directly contributes to leukemogenesis. Copyright © 2012 Elsevier Inc. All rights reserved.


July 7, 2019

Accurate selfcorrection of errors in long reads using de Bruijn graphs.

New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50,000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilisation of the reads in, e.g., de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.We present an error correction method that uses long reads only. The method consists of two phases: first we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75x, the throughput of the new method is at least 20% higher.LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ CONTACT: leena.salmela@cs.helsinki.fi. © The Author(s) 2016. Published by Oxford University Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.