Menu
July 7, 2019

GAML: genome assembly by maximum likelihood.

Resolution of repeats and scaffolding of shorter contigs are critical parts of genome assembly. Modern assemblers usually perform such steps by heuristics, often tailored to a particular technology for producing paired or long reads.We propose a new framework that allows systematic combination of diverse sequencing datasets into a single assembly. We achieve this by searching for an assembly with the maximum likelihood in a probabilistic model capturing error rate, insert lengths, and other characteristics of the sequencing technology used to produce each dataset. We have implemented a prototype genome assembler GAML that can use any combination of insert sizes with Illumina or 454 reads, as well as PacBio reads. Our experiments show that we can assemble short genomes with N50 sizes and error rates comparable to ALLPATHS-LG or Cerulean. While ALLPATHS-LG and Cerulean require each a specific combination of datasets, GAML works on any combination.We have introduced a new probabilistic approach to genome assembly and demonstrated that this approach can lead to superior results when used to combine diverse set of datasets from different sequencing technologies. Data and software is available at http://compbio.fmph.uniba.sk/gaml.


July 7, 2019

Complete genome sequence of endophytic nitrogen-fixing Klebsiella variicola strain DX120E.

Klebsiella variicola strain DX120E (=CGMCC 1.14935) is an endophytic nitrogen-fixing bacterium isolated from sugarcane crops grown in Guangxi, China and promotes sugarcane growth. Here we summarize the features of the strain DX120E and describe its complete genome sequence. The genome contains one circular chromosome and two plasmids, and contains 5,718,434 nucleotides with 57.1% GC content, 5,172 protein-coding genes, 25 rRNA genes, 87 tRNA genes, 7 ncRNA genes, 25 pseudo genes, and 2 CRISPR repeats.


July 7, 2019

Comparative analyses of clinical and environmental populations of Cryptococcus neoformans in Botswana.

Cryptococcus neoformans var. grubii (Cng) is the most common cause of fungal meningitis, and its prevalence is highest in sub-Saharan Africa. Patients become infected by inhaling airborne spores or desiccated yeast cells from the environment, where the fungus thrives in avian droppings, trees and soil. To investigate the prevalence and population structure of Cng in southern Africa, we analysed isolates from 77 environmental samples and 64 patients. We detected significant genetic diversity among isolates and strong evidence of geographic structure at the local level. High proportions of isolates with the rare MATa allele were observed in both clinical and environmental isolates; however, the mating-type alleles were unevenly distributed among different subpopulations. Nearly equal proportions of the MATa and MATa mating types were observed among all clinical isolates and in one environmental subpopulation from the eastern part of Botswana. As previously reported, there was evidence of both clonality and recombination in different geographic areas. These results provide a foundation for subsequent genomewide association studies to identify genes and genotypes linked to pathogenicity in humans. © 2015 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.


July 7, 2019

Complete genome sequence of Burkholderia cepacia strain LO6.

Burkholderia cepacia strain LO6 is a betaproteobacterium that was isolated from a cystic fibrosis patient. Here we report the 6.4 Mb draft genome sequence assembled into 2 contigs. This genome sequence will aid the transcriptomic profiling of this bacterium and help us to better understand the mechanisms specific to pulmonary infections. Copyright © 2015 Belcaid et al.


July 7, 2019

Sequence type 1 group B Streptococcus, an emerging cause of invasive disease in adults, evolves by small genetic changes.

The molecular mechanisms underlying pathogen emergence in humans is a critical but poorly understood area of microbiologic investigation. Serotype V group B Streptococcus (GBS) was first isolated from humans in 1975, and rates of invasive serotype V GBS disease significantly increased starting in the early 1990s. We found that 210 of 229 serotype V GBS strains (92%) isolated from the bloodstream of nonpregnant adults in the United States and Canada between 1992 and 2013 were multilocus sequence type (ST) 1. Elucidation of the complete genome of a 1992 ST-1 strain revealed that this strain had the highest homology with a GBS strain causing cow mastitis and that the 1992 ST-1 strain differed from serotype V strains isolated in the late 1970s by acquisition of cell surface proteins and antimicrobial resistance determinants. Whole-genome comparison of 202 invasive ST-1 strains detected significant recombination in only eight strains. The remaining 194 strains differed by an average of 97 SNPs. Phylogenetic analysis revealed a temporally dependent mode of genetic diversification consistent with the emergence in the 1990s of ST-1 GBS as major agents of human disease. Thirty-one loci were identified as being under positive selective pressure, and mutations at loci encoding polysaccharide capsule production proteins, regulators of pilus expression, and two-component gene regulatory systems were shown to affect the bacterial phenotype. These data reveal that phenotypic diversity among ST-1 GBS is mainly driven by small genetic changes rather than extensive recombination, thereby extending knowledge into how pathogens adapt to humans.


July 7, 2019

Covalent modification of bacteriophage T4 DNA inhibits CRISPR-Cas9.

The genomic DNAs of tailed bacteriophages are commonly modified by the attachment of chemical groups. Some forms of DNA modification are known to protect phage DNA from cleavage by restriction enzymes, but others are of unknown function. Recently, the CRISPR-Cas nuclease complexes were shown to mediate bacterial adaptive immunity by RNA-guided target recognition, raising the question of whether phage DNA modifications may also block attack by CRISPR-Cas9. We investigated phage T4 as a model system, where cytosine is replaced with glucosyl-hydroxymethylcytosine (glc-HMC). We first quantified the extent and distribution of covalent modifications in T4 DNA by single-molecule DNA sequencing and enzymatic probing. We then designed CRISPR spacer sequences targeting T4 and found that wild-type T4 containing glc-HMC was insensitive to attack by CRISPR-Cas9 but mutants with unmodified cytosine were sensitive. Phage with HMC showed only intermediate sensitivity. While this work was in progress, another group reported examples of heavily engineered CRISRP-Cas9 complexes that could, in fact, overcome the effects of T4 DNA modification, indicating that modifications can inhibit but do not always fully block attack.Bacteria were recently found to have a form of adaptive immunity, the CRISPR-Cas systems, which use nucleic acid pairing to recognize and cleave genomic DNA of invaders such as bacteriophage. Historic work with tailed phages has shown that phage DNA is often modified by covalent attachment of large chemical groups. Here we demonstrate that DNA modification in phage T4 inhibits attack by the CRISPR-Cas9 system. This finding provides insight into mechanisms of host-virus competition and also a new set of tools that may be useful in modulating the activity of CRISPR-Cas9 in genome engineering applications. Copyright © 2015 Bryson et al.


July 7, 2019

BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection.

Although recent developed algorithms have integrated multiple signals to improve sensitivity for insertion and deletion (INDEL) detection, they are far from being perfect and still have great limitations in detecting a full size range of INDELs. Here we present BreakSeek, a novel breakpoint-based algorithm, which can unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations. Comprehensive evaluations on both simulated and real datasets revealed that BreakSeek outperformed other existing methods on both sensitivity and specificity in detecting both small and large INDELs, and uncovered a significant amount of novel INDELs that were missed before. In addition, by incorporating sophisticated statistic models, we for the first time investigated and demonstrated the importance of handling false and conflicting signals for multi-signal integrated methods.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

The Streptomyces leeuwenhoekii genome: de novo sequencing and assembly in single contigs of the chromosome, circular plasmid pSLE1 and linear plasmid pSLE2.

Next Generation DNA Sequencing (NGS) and genome mining of actinomycetes and other microorganisms is currently one of the most promising strategies for the discovery of novel bioactive natural products, potentially revealing novel chemistry and enzymology involved in their biosynthesis. This approach also allows rapid insights into the biosynthetic potential of microorganisms isolated from unexploited habitats and ecosystems, which in many cases may prove difficult to culture and manipulate in the laboratory. Streptomyces leeuwenhoekii (formerly Streptomyces sp. strain C34) was isolated from the hyper-arid high-altitude Atacama Desert in Chile and shown to produce novel polyketide antibiotics.Here we present the de novo sequencing of the S. leeuwenhoekii linear chromosome (8 Mb) and two extrachromosomal replicons, the circular pSLE1 (86 kb) and the linear pSLE2 (132 kb), all in single contigs, obtained by combining Pacific Biosciences SMRT (PacBio) and Illumina MiSeq technologies. We identified the biosynthetic gene clusters for chaxamycin, chaxalactin, hygromycin A and desferrioxamine E, metabolites all previously shown to be produced by this strain (J Nat Prod, 2011, 74:1965) and an additional 31 putative gene clusters for specialised metabolites. As well as gene clusters for polyketides and non-ribosomal peptides, we also identified three gene clusters encoding novel lasso-peptides.The S. leeuwenhoekii genome contains 35 gene clusters apparently encoding the biosynthesis of specialised metabolites, most of them completely novel and uncharacterised. This project has served to evaluate the current state of NGS for efficient and effective genome mining of high GC actinomycetes. The PacBio technology now permits the assembly of actinomycete replicons into single contigs with >99 % accuracy. The assembled Illumina sequence permitted not only the correction of omissions found in GC homopolymers in the PacBio assembly (exacerbated by the high GC content of actinomycete DNA) but it also allowed us to obtain the sequences of the termini of the chromosome and of a linear plasmid that were not assembled by PacBio. We propose an experimental pipeline that uses the Illumina assembled contigs, in addition to just the reads, to complement the current limitations of the PacBio sequencing technology and assembly software.


July 7, 2019

Complete genome sequencing of a multidrug-resistant and human-invasive Salmonella enterica serovar Typhimurium strain of the emerging sequence type 213 genotype.

Salmonella enterica subsp. enterica serovar Typhimurium strain YU39 was isolated in 2005 in the state of Yucatán, Mexico, from a human systemic infection. The YU39 strain is representative of the multidrug-resistant emergent sequence type 213 (ST213) genotype. The YU39 complete genome is composed of a chromosome and seven plasmids. Copyright © 2015 Calva et al.


July 7, 2019

Complete genome sequence of Salmonella enterica subsp. enterica serovar Agona 460004 2-1, associated with a multistate outbreak in the United States.

Within the last several years, Salmonella enterica subsp. enterica serovar Agona has been among the 20 most frequently isolated serovars in clinical cases of salmonellosis. In this report, the complete genome sequence of S. Agona strain 460004 2-1 isolated from unsweetened puffed-rice cereal during a multistate outbreak in 2008 was sequenced using single-molecule real-time DNA sequencing. Copyright © 2015 Hoffmann et al.


July 7, 2019

The mitochondrial genomes of a Myxozoan genus Kudoa are extremely divergent in Metazoa.

The Myxozoa are oligo-cellular parasites with alternate hosts-fish and annelid worms-and some myxozoan species harm farmed fish. The phylum Myxozoa, comprising 2,100 species, was difficult to position in the tree of life, due to its fast evolutionary rate. Recent phylogenomic studies utilizing an extensive number of nuclear-encoded genes have confirmed that Myxozoans belong to Cnidaria. Nevertheless, the evolution of parasitism and extreme body simplification in Myxozoa is not well understood, and no myxozoan mitochondrial DNA sequence has been reported to date. To further elucidate the evolution of Myxozoa, we sequenced the mitochondrial genomes of the myxozoan species Kudoa septempunctata, K. hexapunctata and K. iwatai and compared them with those of other metazoans. The Kudoa mitochondrial genomes code for ribosomal RNAs, transfer RNAs, eight proteins for oxidative phosphorylation and three proteins of unknown function, and they are among the metazoan mitochondrial genomes coding the fewest proteins. The mitochondrial-encoded proteins were extremely divergent, exhibiting the fastest evolutionary rate in Metazoa. Nevertheless, the dN/dS ratios of the protein genes in genus Kudoa were approximately 0.1 and similar to other cnidarians, indicating that the genes are under negative selection. Despite the divergent genetic content, active oxidative phosphorylation was indicated by the transcriptome, metabolism and structure of mitochondria in K. septempunctata. As possible causes, we attributed the divergence to the population genetic characteristics shared between the two most divergent clades, Ctenophora and Myxozoa, and to the parasitic lifestyle of Myxozoa. The fast-evolving, functional mitochondria of the genus Kudoa expanded our understanding of metazoan mitochondrial evolution.


July 7, 2019

Comparative Analysis of the Shared Sex-Determination Region (SDR) among Salmonid Fishes.

Salmonids present an excellent model for studying evolution of young sex-chromosomes. Within the genus, Oncorhynchus, at least six independent sex-chromosome pairs have evolved, many unique to individual species. This variation results from the movement of the sex-determining gene, sdY, throughout the salmonid genome. While sdY is known to define sexual differentiation in salmonids, the mechanism of its movement throughout the genome has remained elusive due to high frequencies of repetitive elements, rDNA sequences, and transposons surrounding the sex-determining regions (SDR). Despite these difficulties, bacterial artificial chromosome (BAC) library clones from both rainbow trout and Atlantic salmon containing the sdY region have been reported. Here, we report the sequences for these BACs as well as the extended sequence for the known SDR in Chinook gained through genome walking methods. Comparative analysis allowed us to study the overlapping SDRs from three unique salmonid Y chromosomes to define the specific content, size, and variation present between the species. We found approximately 4.1 kb of orthologous sequence common to all three species, which contains the genetic content necessary for masculinization. The regions contain transposable elements that may be responsible for the translocations of the SDR throughout salmonid genomes and we examine potential mechanistic roles of each one. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.