Joint Genome Institute Archives - Page 14 of 16

July 7, 2019 |

Third-generation sequencing and the future of genomics

Third-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address long-standing problems in de novo genome assembly, structural variation analysis and haplotype phasing.

July 7, 2019 |

Complete genome sequence of antibiotic and anticancer agent violacein producing Massilia sp. strain NR 4-1.

Massilia sp. NR 4-1 was a violacein producing strain newly isolated from topsoil under nutmeg tree, Torreya nucifera in Korean national monument Bijarim Forest. Violacein is a novel class of drug exhibiting anticancer and antibiotic activities originated from l-tryptophan. Here, we present the complete genome of Massilia sp. strain NR 4-1 of 6,361,416bp and total 5285 coding sequences (CDSs) including a complete violacein biosynthesis pathway, vioABCDE. The genome sequence of Massilia sp. NR 4-1 will provide stable and efficient biotechnological applications of violacein production. Copyright © 2016 Elsevier B.V. All rights reserved.

July 7, 2019 |

Genome sequence and analysis of a stress-tolerant, wild-derived strain of Saccharomyces cerevisiae used in biofuels research

The genome sequences of more than 100 strains of the yeast Saccharomyces cerevisiae have been published. Unfortunately, most of these genome assemblies contain dozens to hundreds of gaps at repetitive sequences, including transposable elements, tRNAs, and subtelomeric regions, which is where novel genes generally reside. Relatively few strains have been chosen for genome sequencing based on their biofuel production potential, leaving an additional knowledge gap. Here, we describe the nearly complete genome sequence of GLBRCY22-3 (Y22-3), a strain of S. cerevisiae derived from the stress-tolerant wild strain NRRL YB-210 and subsequently engineered for xylose metabolism. After benchmarking several genome assembly approaches, we developed a pipeline to integrate Pacific Biosciences (PacBio) and Illumina sequencing data and achieved one of the highest quality genome assemblies for any S. cerevisiae strain. Specifically, the contig N50 is 693 kbp, and the sequences of most chromosomes, the mitochondrial genome, and the 2-micron plasmid are complete. Our annotation predicts 92 genes that are not present in the reference genome of the laboratory strain S288c, over 70% of which were expressed. We predicted functions for 43 of these genes, 28 of which were previously uncharacterized and unnamed. Remarkably, many of these genes are predicted to be involved in stress tolerance and carbon metabolism and are shared with a Brazilian bioethanol production strain, even though the strains differ dramatically at most genetic loci. The Y22-3 genome sequence provides an exceptionally high-quality resource for basic and applied research in bioenergy and genetics. Copyright © 2016 McIlwain et al.

July 7, 2019 |

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly.We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies.Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI CONTACT: ali.bashir@mssm.eduSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.

July 7, 2019 |

Resurgence of less-studied smut fungi as models of phytopathogenesis in the -omics era.

The smut fungi form a large, diverse, and non-monophyletic group of plant pathogens that have long served as both important pests of human agriculture but also as fertile organisms of scientific investigation. As modern techniques of molecular genetic analysis became available, many previously-studied species that proved refractive to these techniques fell by the wayside to become neglected. Now, as the advent of rapid and affordable next-generation sequencing provides genomic and transcriptomic resources for even these “forgotten” fungi, several species are making a come-back and retaking prominent places in phytopathogenic research. In this review, we highlight several of these smut fungi, with special emphasis on Microbotryum lychnidis-dioicae, an anther smut, whose molecular genetic tools have finally begun to catch up with its historical importance in classical genetics and now provide mechanistic insights for ecological studies, evolution of host/pathogen interaction, and investigations of emerging infectious disease.

July 7, 2019 |

High-quality draft genomes from Thermus caliditerrae YIM 77777 and T. tengchongensis YIM 77401, isolates from Tengchong, China.

The draft genomes of Thermus tengchongensis YIM 77401 and T. caliditerrae YIM 77777 are 2,562,314 and 2,218,114 bp and encode 2,726 and 2,305 predicted genes, respectively. Gene content and growth experiments demonstrate broad metabolic capacity, including starch hydrolysis, thiosulfate oxidation, arsenite oxidation, incomplete denitrification, and polysulfide reduction. Copyright © 2016 Mefferd et al.

July 7, 2019 |

A roadmap for gene system development in Clostridium.

Clostridium species are both heroes and villains. Some cause serious human and animal diseases, those present in the gut microbiota generally contribute to health and wellbeing, while others represent useful industrial chassis for the production of chemicals and fuels. To understand, counter or exploit, there is a fundamental requirement for effective systems that may be used for directed or random genome modifications. We have formulated a simple roadmap whereby the necessary gene systems maybe developed and deployed. At its heart is the use of ‘pseudo-suicide’ vectors and the creation of a pyrE mutant (a uracil auxotroph), initially aided by ClosTron technology, but ultimately made using a special form of allelic exchange termed ACE (Allele-Coupled Exchange). All mutants, regardless of the mutagen employed, are made in this host. This is because through the use of ACE vectors, mutants can be rapidly complemented concomitant with correction of the pyrE allele and restoration of uracil prototrophy. This avoids the phenotypic effects frequently observed with high copy number plasmids and dispenses with the need to add antibiotic to ensure plasmid retention. Once available, the pyrE host may be used to stably insert all manner of application specific modules. Examples include, a sigma factor to allow deployment of a mariner transposon, hydrolases involved in biomass deconstruction and therapeutic genes in cancer delivery vehicles. To date, provided DNA transfer is obtained, we have not encountered any clostridial species where this technology cannot be applied. These include, Clostridium difficile, Clostridium acetobutylicum, Clostridium beijerinckii, Clostridium botulinum, Clostridium perfringens, Clostridium sporogenes, Clostridium pasteurianum, Clostridium ljungdahlii, Clostridium autoethanogenum and even Geobacillus thermoglucosidasius. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

July 7, 2019 |

High-quality permanent draft genome sequence of Ensifer sp. PC2, isolated from a nitrogen-fixing root nodule of the legume tree (Khejri) native to the Thar Desert of India.

Ensifer sp. PC2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a nitrogen-fixing nodule of the tree legume P. cineraria (L.) Druce (Khejri), which is a keystone species that grows in arid and semi-arid regions of the Indian Thar desert. Strain PC2 exists as a dominant saprophyte in alkaline soils of Western Rajasthan. It is fast growing, well-adapted to arid conditions and is able to form an effective symbiosis with several annual crop legumes as well as species of mimosoid trees and shrubs. Here we describe the features of Ensifer sp. PC2, together with genome sequence information and its annotation. The 8,458,965 bp high-quality permanent draft genome is arranged into 171 scaffolds of 171 contigs containing 8,344 protein-coding genes and 139 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.

July 7, 2019 |

Complete genome of Nitrosospira briensis C-128, an ammonia-oxidizing bacterium from agricultural soil.

Nitrosospira briensis C-128 is an ammonia-oxidizing bacterium isolated from an acid agricultural soil. N. briensis C-128 was sequenced with PacBio RS technologies at the DOE-Joint Genome Institute through their Community Science Program (2010). The high-quality finished genome contains one chromosome of 3.21 Mb and no plasmids. We identified 3073 gene models, 3018 of which are protein coding. The two-way average nucleotide identity between the chromosomes of Nitrosospira multiformis ATCC 25196 and Nitrosospira briensis C-128 was found to be 77.2 %. Multiple copies of modules encoding chemolithotrophic metabolism were identified in their genomic context. The gene inventory supports chemolithotrophic metabolism with implications for function in soil environments.

July 7, 2019 |

SAR11 bacteria linked to ocean anoxia and nitrogen loss.

Bacteria of the SAR11 clade constitute up to one half of all microbial cells in the oxygen-rich surface ocean. SAR11 bacteria are also abundant in oxygen minimum zones (OMZs), where oxygen falls below detection and anaerobic microbes have vital roles in converting bioavailable nitrogen to N2 gas. Anaerobic metabolism has not yet been observed in SAR11, and it remains unknown how these bacteria contribute to OMZ biogeochemical cycling. Here, genomic analysis of single cells from the world’s largest OMZ revealed previously uncharacterized SAR11 lineages with adaptations for life without oxygen, including genes for respiratory nitrate reductases (Nar). SAR11 nar genes were experimentally verified to encode proteins catalysing the nitrite-producing first step of denitrification and constituted ~40% of OMZ nar transcripts, with transcription peaking in the anoxic zone of maximum nitrate reduction activity. These results link SAR11 to pathways of ocean nitrogen loss, redefining the ecological niche of Earth’s most abundant organismal group.

July 7, 2019 |

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

The bacterial genus Endozoicomonas was commonly detected in healthy corals in many coral-associated bacteria studies in the past decade. Although, it is likely to be a core member of coral microbiota, little is known about its ecological roles. To decipher potential interactions between bacteria and their coral hosts, we sequenced and investigated the first culturable endozoicomonal bacterium from coral, the E. montiporae CL-33(T). Its genome had potential sign of ongoing genome erosion and gene exchange with its host. Testosterone degradation and type III secretion system are commonly present in Endozoicomonas and may have roles to recognize and deliver effectors to their hosts. Moreover, genes of eukaryotic ephrin ligand B2 are present in its genome; presumably, this bacterium could move into coral cells via endocytosis after binding to coral’s Eph receptors. In addition, 7,8-dihydro-8-oxoguanine triphosphatase and isocitrate lyase are possible type III secretion effectors that might help coral to prevent mitochondrial dysfunction and promote gluconeogenesis, especially under stress conditions. Based on all these findings, we inferred that E. montiporae was a facultative endosymbiont that can recognize, translocate, communicate and modulate its coral host.

July 7, 2019 |

High quality draft genome sequence of the type strain of Pseudomonas lutea OK2(T), a phosphate-solubilizing rhizospheric bacterium.

Pseudomonas lutea OK2(T) (=LMG 21974(T), CECT 5822(T)) is the type strain of the species and was isolated from the rhizosphere of grass growing in Spain in 2003 based on its phosphate-solubilizing capacity. In order to identify the functional significance of phosphate solubilization in Pseudomonas Plant growth promoting rhizobacteria, we describe here the phenotypic characteristics of strain OK2(T) along with its high-quality draft genome sequence, its annotation, and analysis. The genome is comprised of 5,647,497 bp with 60.15 % G?+?C content. The sequence includes 4,846 protein-coding genes and 95 RNA genes.

July 7, 2019 |

Large-scale maps of variable infection efficiencies in aquatic Bacteroidetes phage-host model systems.

Microbes drive ecosystem functioning and their viruses modulate these impacts through mortality, gene transfer and metabolic reprogramming. Despite the importance of virus-host interactions and likely variable infection efficiencies of individual phages across hosts, such variability is seldom quantified. Here, we quantify infection efficiencies of 38 phages against 19 host strains in aquatic Cellulophaga (Bacteroidetes) phage-host model systems. Binary data revealed that some phages infected only one strain while others infected 17, whereas quantitative data revealed that efficiency of infection could vary 10 orders of magnitude, even among phages within one population. This provides a baseline for understanding and modeling intrapopulation host range variation. Genera specific host ranges were also informative. For example, the Cellulophaga Microviridae, showed a markedly broader intra-species host range than previously observed in Escherichia coli systems. Further, one phage genus, Cba41, was examined to investigate nonheritable changes in plating efficiency and burst size that depended on which host strain it most recently infected. While consistent with host modification of phage DNA, no differences in nucleotide sequence or DNA modifications were detected, leaving the observation repeatable, but the mechanism unresolved. Overall, this study highlights the importance of quantitatively considering replication variations in studies of phage-host interactions. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

July 7, 2019 |

Genome sequence of Arenibacter algicola strain TG409, a hydrocarbon-degrading bacterium associated with marine eukaryotic phytoplankton.

Arenibacter algicola strain TG409 was isolated from Skeletonema costatum and exhibits the ability to utilize polycyclic aromatic hydrocarbons as sole sources of carbon and energy. Here, we present the genome sequence of this strain, which is 5,550,230 bp with 4,722 genes and an average G+C content of 39.7%. Copyright © 2016 Gutierrez et al.

July 7, 2019 |

Permanent improved high-quality draft genome sequence of Nocardia casuarinae strain BMG51109, an endophyte of actinorhizal root nodules of Casuarina glauca.

Here, we report the first genome sequence of a Nocardia plant endophyte, N. casuarinae strain BMG51109, isolated from Casuarina glauca root nodules. The improved high-quality draft genome sequence contains 8,787,999 bp with a 68.90% GC content and 7,307 predicted protein-coding genes. Copyright © 2016 Ghodhbane-Gtari et al.

Auto Tag: Joint Genome Institute

Third-generation sequencing and the future of genomics

Complete genome sequence of antibiotic and anticancer agent violacein producing Massilia sp. strain NR 4-1.

Genome sequence and analysis of a stress-tolerant, wild-derived strain of Saccharomyces cerevisiae used in biofuels research

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

Resurgence of less-studied smut fungi as models of phytopathogenesis in the -omics era.

High-quality draft genomes from Thermus caliditerrae YIM 77777 and T. tengchongensis YIM 77401, isolates from Tengchong, China.

A roadmap for gene system development in Clostridium.

High-quality permanent draft genome sequence of Ensifer sp. PC2, isolated from a nitrogen-fixing root nodule of the legume tree (Khejri) native to the Thar Desert of India.

Complete genome of Nitrosospira briensis C-128, an ammonia-oxidizing bacterium from agricultural soil.

SAR11 bacteria linked to ocean anoxia and nitrogen loss.

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

High quality draft genome sequence of the type strain of Pseudomonas lutea OK2(T), a phosphate-solubilizing rhizospheric bacterium.

Large-scale maps of variable infection efficiencies in aquatic Bacteroidetes phage-host model systems.

Genome sequence of Arenibacter algicola strain TG409, a hydrocarbon-degrading bacterium associated with marine eukaryotic phytoplankton.

Permanent improved high-quality draft genome sequence of Nocardia casuarinae strain BMG51109, an endophyte of actinorhizal root nodules of Casuarina glauca.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert