Genome assembly Archives - Page 112 of 196

July 7, 2019

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent ‘third-generation’ sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Complete genome sequence of the caprolactam-degrading bacterium Pseudomonas mosselii SJ10 isolated from wastewater of a nylon 6 production plant.

Pseudomonas mosselii strain SJ10 is a caprolactam-degrading bacterium belonging to the class Gammaproteobacteria, which was isolated from wastewater of the nylon 6 producing Seongseo industrial complex in Daegu, Republic of Korea. Here, we report the complete genome sequence of the strain, providing genetic information for biodegradation of aromatic compounds.

July 7, 2019

The genome of the intracellular bacterium of the coastal bivalve, Solemya velum: a blueprint for thriving in and out of symbiosis

BACKGROUND:Symbioses between chemoautotrophic bacteria and marine invertebrates are rare examples of living systems that are virtually independent of photosynthetic primary production. These associations have evolved multiple times in marine habitats, such as deep-sea hydrothermal vents and reducing sediments, characterized by steep gradients of oxygen and reduced chemicals. Due to difficulties associated with maintaining these symbioses in the laboratory and culturing the symbiotic bacteria, studies of chemosynthetic symbioses rely heavily on culture independent methods. The symbiosis between the coastal bivalve, Solemya velum, and its intracellular symbiont is a model for chemosynthetic symbioses given its accessibility in intertidal environments and the ability to maintain it under laboratory conditions. To better understand this symbiosis, the genome of the S. velum endosymbiont was sequenced.RESULTS:Relative to the genomes of obligate symbiotic bacteria, which commonly undergo erosion and reduction, the S. velum symbiont genome was large (2.7Mb), GC-rich (51%), and contained a large number (78) of mobile genetic elements. Comparative genomics identified sets of genes specific to the chemosynthetic lifestyle and necessary to sustain the symbiosis. In addition, a number of inferred metabolic pathways and cellular processes, including heterotrophy, branched electron transport, and motility, suggested that besides the ability to function as an endosymbiont, the bacterium may have the capacity to live outside the host.CONCLUSIONS:The physiological dexterity indicated by the genome substantially improves our understanding of the genetic and metabolic capabilities of the S. velum symbiont and the breadth of niches the partners may inhabit during their lifecycle.

July 7, 2019

Complete genome sequence of the plant growth-promoting rhizobacterium Pseudomonas aurantiaca strain JD37.

Pseudomonas aurantiaca Strain JD37, a Gram-negative bacterium isolated from potato rhizosphere soil (Shanghai, China), is a plant growth-promoting rhizobacterium. The JD37 genome consists of only one chromosome with no plasmids. Its genome contains genes involved plant growth promoting, biological control, and other function. Here, we present the complete genome sequence of P. aurantiaca JD37. As far as we know, this is the first whole-genome of this species.

July 7, 2019

An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella.

Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.

July 7, 2019

Seeking the source of Pseudomonas aeruginosa infections in a recently opened hospital: an observational study using whole-genome sequencing.

Pseudomonas aeruginosa is a common nosocomial pathogen responsible for significant morbidity and mortality internationally. Patients may become colonised or infected with P. aeruginosa after exposure to contaminated sources within the hospital environment. The aim of this study was to determine whether whole-genome sequencing (WGS) can be used to determine the source in a cohort of burns patients at high risk of P. aeruginosa acquisition.An observational prospective cohort study.Burns care ward and critical care ward in the UK.Patients with >7% total burns by surface area were recruited into the study.All patients were screened for P. aeruginosa on admission and samples taken from their immediate environment, including water. Screening patients who subsequently developed a positive P. aeruginosa microbiology result were subject to enhanced environmental surveillance. All isolates of P. aeruginosa were genome sequenced. Sequence analysis looked at similarity and relatedness between isolates.WGS for 141 P. aeruginosa isolates were obtained from patients, hospital water and the ward environment. Phylogenetic analysis revealed eight distinct clades, with a single clade representing the majority of environmental isolates in the burns unit. Isolates from three patients had identical genotypes compared with water isolates from the same room. There was clear clustering of water isolates by room and outlet, allowing the source of acquisitions to be unambiguously identified. Whole-genome shotgun sequencing of biofilm DNA extracted from a thermostatic mixer valve revealed this was the source of a P. aeruginosa subpopulation previously detected in water. In the remaining two cases there was no clear link to the hospital environment.This study reveals that WGS can be used for source tracking of P. aeruginosa in a hospital setting, and that acquisitions can be traced to a specific source within a hospital ward. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

July 7, 2019

Complete genome sequence of Cellulophaga lytica HI1 using PacBio Single-Molecule Real-Time Sequencing.

We report here the complete genome sequence of Cellulophaga lytica HI1 isolated from a seawater table located at the Kewalo Marine Laboratory (Honolulu, HI). This is the first complete de novo genome assembly of C. lytica HI1 using PacBio single-molecule real-time (SMRT) sequencing, which resulted in a single scaffold of 3.8 Mb. Copyright © 2014 Asahina and Hadfield.

July 7, 2019

De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.© 2014 American Society of Plant Biologists. All Rights Reserved.

July 7, 2019

Genomics of wood-degrading fungi.

Woody plants convert the energy of the sun into lignocellulosic biomass, which is an abundant substrate for bioenergy production. Fungi, especially wood decayers from the class Agaricomycetes, have evolved ways to degrade lignocellulose into its monomeric constituents, and understanding this process may facilitate the development of biofuels. Over the past decade genomics has become a powerful tool to study the Agaricomycetes. In 2004 the first sequenced genome of the white rot fungus Phanerochaete chrysosporium revealed a rich catalog of lignocellulolytic enzymes. In the decade that followed the number of genomes of Agaricomycetes grew to more than 75 and revealed a diversity of wood-decaying strategies. New technologies for high-throughput functional genomics are now needed to further study these organisms. Copyright © 2014 Elsevier Inc. All rights reserved.

July 7, 2019

The odd one out: Bacillus ACT bacteriophage CP-51 exhibits unusual properties compared to related Spounavirinae W.Ph. and Bastille.

The Bacillus ACT group includes three important pathogenic species of Bacillus: anthracis, cereus and thuringiensis. We characterized three virulent bacteriophages, Bastille, W.Ph. and CP-51, that infect various strains of these three species. We have determined the complete genome sequences of CP-51, W.Ph. and Bastille, and their physical genome structures. The CP-51 genome sequence could only be obtained using a combination of conventional and second and third next generation sequencing technologies – illustrating the problems associated with sequencing highly modified DNA. We present evidence that the generalized transduction facilitated by CP-51 is independent of a specific genome structure, but likely due to sporadic packaging errors of the terminase. There is clear correlation of the genetic and morphological features of these phages validating their placement in the Spounavirinae subfamily (SPO1-related phages) of the Myoviridae. This study also provides tools for the development of phage-based diagnostics/therapeutics for this group of pathogens. Copyright © 2014 Elsevier Inc. All rights reserved.

July 7, 2019

De novo assembly and characterization of the complete chloroplast genome of radish (Raphanus sativus L.).

Radish (Raphanus sativus L.) is an edible root vegetable crop that is cultivated worldwide and whose genome has been sequenced. Here we report the complete nucleotide sequence of the radish cultivar WK10039 chloroplast (cp) genome, along with a de novo assembly strategy using whole genome shotgun sequence reads obtained by next generation sequencing. The radish cp genome is 153,368 bp in length and has a typical quadripartite structure, composed of a pair of inverted repeat regions (26,217 bp each), a large single copy region (83,170 bp), and a small single copy region (17,764 bp). The radish cp genome contains 87 predicted protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence analysis revealed the presence of 91 simple sequence repeats (SSRs) in the radish cp genome. Phylogenetic analysis of 62 protein-coding gene sequences from the 17 cp genomes of the Brassicaceae family suggested that the radish cp genome is most closely related to the cp genomes of Brassica rapa and Brassicanapus. Comparisons with the B. rapa and B. napus cp genomes revealed highly divergent intergenic sequences and introns that can potentially be developed as diagnostic cp markers. Synonymous and nonsynonymous substitutions of cp genes suggested that nucleotide substitutions have occurred at similar rates in most genes. The complete sequence of the radish cp genome would serve as a valuable resource for the development of new molecular markers and the study of the phylogenetic relationships of Raphanus species in the Brassicaceae family. Copyright © 2014 Elsevier B.V. All rights reserved.

July 7, 2019

Get your high-quality low-cost genome sequence.

The study of whole-genome sequences has become essential for almost all branches of biological research. Next-generation sequencing (NGS) has revolutionized the scalability, speed, and resolution of sequencing and brought genomic science within reach of academic laboratories that study non-model organisms. Here, we show that a high-quality draft genome of a eukaryote can be obtained at relatively low cost by exploiting a hybrid combination of sequencing strategies. Copyright © 2014 Elsevier Ltd. All rights reserved.

July 7, 2019

The characterization of goat genetic diversity: Towards a genomic approach

The investigation of genetic diversity at molecular level has been proposed as a valuable complement and sometimes proxy to phenotypic diversity of local breeds and is presently considered as one of the FAO priorities for breed characterization. By recommending a set of selected molecular markers for each of the main livestock species, FAO has promoted the meta-analysis of local datasets, to achieve a global view of molecular genetic diversity. Analysis within the EU Globaldiv project of two large goat microsatellite datasets produced by the Econogene Consortium and the IAEA CRP–Asia Consortium, respectively, has generated a picture of goat diversity across continents. This indicates a gradient of decreasing diversity from the domestication centre towards Europe and Asia, a clear phylogeographic structure at the continental and regional levels, and in Asia a limited genetic differentiation among local breeds. The development of SNP panels that assay thousands of markers and the whole genome sequencing of livestock permit an affordable use of genomic technologies in all livestock species, goats included. Preliminary data from the Italian Goat Consortium indicate that the SNP panel developed for this species is highly informative. The existing panel can be improved by integrating additional SNPs identified from the whole genome sequence alignment of goats adapted to extreme climates. Part of this effort is being achieved by international projects (e.g. EU FP7 NextGen and 3SR projects), but a fair representation of the global diversity in goats requires a large panel of samples (i.e. as in the recently launched 1000 cattle genomes initiative). Genomic technologies offer new strategies to investigate complex traits difficult to measure. For example, the comparison of patterns of diversity among the genomes in selected groups of animals (e.g. adapted to different environments) and the integration of genome-wide diversity with new GIScience-based methods are able to identify molecular markers associated with genomic regions of putative importance in adaptation and thus pave the way for the identification of causative genes. Goat breeds adapted to different production systems in extreme and harsh environments will play an important role in this process. The new sequencing technologies also permit the analysis of the entire mitochondrial genome at maximum resolution. The complete mtDNA sequence is now the common standard format for the investigation of human maternal lineages. A preliminary analysis of the complete goat mtDNA genome supports a single Neolithic origin of domestic goats rather than multiple domestication events in different geographic areas.

July 7, 2019

Genome sequencing of an extended series of NDM-producing Klebsiella pneumoniae isolates from Neonatal infections in a Nepali hospital characterizes the extent of community- versus hospital-associated transmission in an endemic setting.

NDM-producing Klebsiella pneumoniae strains represent major clinical and infection control challenges, particularly in resource-limited settings with high rates of antimicrobial resistance. Determining whether transmission occurs at a gene, plasmid, or bacterial strain level and within hospital and/or the community has implications for monitoring and controlling spread. Whole-genome sequencing (WGS) is the highest-resolution typing method available for transmission epidemiology. We sequenced carbapenem-resistant K. pneumoniae isolates from 26 individuals involved in several infection case clusters in a Nepali neonatal unit and 68 other clinical Gram-negative isolates from a similar time frame, using Illumina and PacBio technologies. Within-outbreak chromosomal and closed-plasmid structures were generated and used as data set-specific references. Three temporally separated case clusters were caused by a single NDM K. pneumoniae strain with a conserved set of four plasmids, one being a 304,526-bp plasmid carrying blaNDM-1. The plasmids contained a large number of antimicrobial/heavy metal resistance and plasmid maintenance genes, which may have explained their persistence. No obvious environmental/human reservoir was found. There was no evidence of transmission of outbreak plasmids to other Gram-negative clinical isolates, although blaNDM variants were present in other isolates in different genetic contexts. WGS can effectively define complex antimicrobial resistance epidemiology. Wider sampling frames are required to contextualize outbreaks. Infection control may be effective in terminating outbreaks caused by particular strains, even in areas with widespread resistance, although this study could not demonstrate evidence supporting specific interventions. Larger, detailed studies are needed to characterize resistance genes, vectors, and host strains involved in disease, to enable effective intervention. Copyright © 2014 Stoesser et al.

July 7, 2019

Diversification of bacterial genome content through distinct mechanisms over different timescales.

Bacterial populations often consist of multiple co-circulating lineages. Determining how such population structures arise requires understanding what drives bacterial diversification. Using 616 systematically sampled genomes, we show that Streptococcus pneumoniae lineages are typically characterized by combinations of infrequently transferred stable genomic islands: those moving primarily through transformation, along with integrative and conjugative elements and phage-related chromosomal islands. The only lineage containing extensive unique sequence corresponds to a set of atypical unencapsulated isolates that may represent a distinct species. However, prophage content is highly variable even within lineages, suggesting frequent horizontal transmission that would necessitate rapidly diversifying anti-phage mechanisms to prevent these viruses sweeping through populations. Correspondingly, two loci encoding Type I restriction-modification systems able to change their specificity over short timescales through intragenomic recombination are ubiquitous across the collection. Hence short-term pneumococcal variation is characterized by movement of phage and intragenomic rearrangements, with the slower transfer of stable loci distinguishing lineages.

Auto Tag: Genome assembly

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Complete genome sequence of the caprolactam-degrading bacterium Pseudomonas mosselii SJ10 isolated from wastewater of a nylon 6 production plant.

The genome of the intracellular bacterium of the coastal bivalve, Solemya velum: a blueprint for thriving in and out of symbiosis

Complete genome sequence of the plant growth-promoting rhizobacterium Pseudomonas aurantiaca strain JD37.

An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella.

Seeking the source of Pseudomonas aeruginosa infections in a recently opened hospital: an observational study using whole-genome sequencing.

Complete genome sequence of Cellulophaga lytica HI1 using PacBio Single-Molecule Real-Time Sequencing.

De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

Genomics of wood-degrading fungi.

The odd one out: Bacillus ACT bacteriophage CP-51 exhibits unusual properties compared to related Spounavirinae W.Ph. and Bastille.

De novo assembly and characterization of the complete chloroplast genome of radish (Raphanus sativus L.).

Get your high-quality low-cost genome sequence.

The characterization of goat genetic diversity: Towards a genomic approach

Genome sequencing of an extended series of NDM-producing Klebsiella pneumoniae isolates from Neonatal infections in a Nepali hospital characterizes the extent of community- versus hospital-associated transmission in an endemic setting.

Diversification of bacterial genome content through distinct mechanisms over different timescales.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert