Genome assembly Archives - Page 115 of 196

July 7, 2019

Multiple genome sequences of Helicobacter pylori strains of diverse disease and antibiotic resistance backgrounds from Malaysia.

Helicobacter pylori causes human gastroduodenal diseases, including chronic gastritis and peptic ulcer disease. It is also a major microbial risk factor for the development of gastric adenocarcinoma and mucosa-associated lymphoid tissue (MALT) lymphoma. Twenty-one strains with different ethnicity, disease, and antimicrobial susceptibility backgrounds were sequenced by use of Illumina HiSeq and PacBio RS platforms.

July 7, 2019

Complete genome sequence of Carnobacterium gilichinskyi strain WN1359T (DSM 27470T).

We report the complete genome sequence of Carnobacterium gilichinskyi strain WN1359, previously isolated from Siberian permafrost and capable of growth under cold (0°C), anoxic, CO2-dominated, low-pressure (0.7-kPa) conditions in a simulation of the Mars atmosphere.

July 7, 2019

Genome sequence of Phaeobacter inhibens type strain (T5(T)), a secondary metabolite producing representative of the marine Roseobacter clade, and emendation of the species description of Phaeobacter inhibens.

Strain T5(T) is the type strain of the species Phaeobacter inhibens Martens et al. 2006, a secondary metabolite producing bacterium affiliated to the Roseobacter clade. Strain T5(T) was isolated from a water sample taken at the German Wadden Sea, southern North Sea. Here we describe the complete genome sequence and annotation of this bacterium with a special focus on the secondary metabolism and compare it with the genomes of the Phaeobacter inhibens strains DSM 17395 and DSM 24588 (2.10), selected because of the close phylogenetic relationship based on the 16S rRNA gene sequences of these three strains. The genome of strain T5(T) comprises 4,130,897 bp with 3.923 protein-coding genes and shows high similarities in genetic and genomic characteristics compared to P. inhibens DSM 17395 and DSM 24588 (2.10). Besides the chromosome, strain T5(T) possesses four plasmids, three of which show a high similarity to the plasmids of the strains DSM 17395 and DSM 24588 (2.10). Analysis of the fourth plasmid suggested horizontal gene transfer. Most of the genes on this plasmid are not present in the strains DSM 17395 and DSM 24588 (2.10) including a nitrous oxide reductase, which allows strain T5(T) a facultative anaerobic lifestyle. The G+C content was calculated from the genome sequence and differs significantly from the previously published value, thus warranting an emendation of the species description.

July 7, 2019

Comparing the genomes of Helicobacter pylori clinical strain UM032 and mice-adapted derivatives.

Helicobacter pylori is a Gram-negative bacterium that persistently infects the human stomach inducing chronic inflammation. The exact mechanisms of pathogenesis are still not completely understood. Although not a natural host for H. pylori, mouse infection models play an important role in establishing the immunology and pathogenicity of H. pylori. In this study, for the first time, the genome sequences of clinical H. pylori strain UM032 and mice-adapted derivatives, 298 and 299, were sequenced using the PacBio Single Molecule, Real-Time (SMRT) technology.Here, we described the single contig which was achieved for UM032 (1,599,441 bp), 298 (1,604,216 bp) and 299 (1,601,149 bp). Preliminary analysis suggested that methylation of H. pylori genome through its restriction modification system may be determinative of its host specificity and adaptation.Availability of these genomic sequences will aid in enhancing our current level of understanding the host specificity of H. pylori.

July 7, 2019

The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production.

July 7, 2019

Cerulean: A hybrid assembly using high throughput short and long reads

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.

July 7, 2019

Finished bacterial genomes from shotgun sequence data.

Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been “finished” at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.

July 7, 2019

A hybrid approach for the automated finishing of bacterial genomes.

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.

July 7, 2019

Cancer genomics: technology, discovery, and translation.

In recent years, the increasing awareness that somatic mutations and other genetic aberrations drive human malignancies has led us within reach of personalized cancer medicine (PCM). The implementation of PCM is based on the following premises: genetic aberrations exist in human malignancies; a subset of these aberrations drive oncogenesis and tumor biology; these aberrations are actionable (defined as having the potential to affect management recommendations based on diagnostic, prognostic, and/or predictive implications); and there are highly specific anticancer agents available that effectively modulate these targets. This article highlights the technology underlying cancer genomics and examines the early results of genome sequencing and the challenges met in the discovery of new genetic aberrations. Finally, drawing from experiences gained in a feasibility study of somatic mutation genotyping and targeted exome sequencing led by Princess Margaret Hospital-University Health Network and the Ontario Institute for Cancer Research, the processes, challenges, and issues involved in the translation of cancer genomics to the clinic are discussed.

July 7, 2019

Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011.

The degree to which molecular epidemiology reveals information about the sources and transmission patterns of an outbreak depends on the resolution of the technology used and the samples studied. Isolates of Escherichia coli O104:H4 from the outbreak centered in Germany in May-July 2011, and the much smaller outbreak in southwest France in June 2011, were indistinguishable by standard tests. We report a molecular epidemiological analysis using multiplatform whole-genome sequencing and analysis of multiple isolates from the German and French outbreaks. Isolates from the German outbreak showed remarkably little diversity, with only two single nucleotide polymorphisms (SNPs) found in isolates from four individuals. Surprisingly, we found much greater diversity (19 SNPs) in isolates from seven individuals infected in the French outbreak. The German isolates form a clade within the more diverse French outbreak strains. Moreover, five isolates derived from a single infected individual from the French outbreak had extremely limited diversity. The striking difference in diversity between the German and French outbreak samples is consistent with several hypotheses, including a bottleneck that purged diversity in the German isolates, variation in mutation rates in the two E. coli outbreak populations, or uneven distribution of diversity in the seed populations that led to each outbreak.

July 7, 2019

Complete genome sequence of Liberibacter crescens BT-1.

Liberibacter crescens BT-1, a Gram-negative, rod-shaped bacterial isolate, was previously recovered from mountain papaya to gain insight on Huanglongbing (HLB) and Zebra Chip (ZC) diseases. The genome of BT-1 was sequenced at the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. A finished assembly and annotation yielded one chromosome with a length of 1,504,659 bp and a G+C content of 35.4%. Comparison to other species in the Liberibacter genus, L. crescens has many more genes in thiamine and essential amino acid biosynthesis. This likely explains why L. crescens BT-1 is culturable while the known Liberibacter strains have not yet been cultured. Similar to Candidatus L. asiaticus psy62, the L. crescens BT-1 genome contains two prophage regions.

July 7, 2019

Draft genome sequence of Salimicrobium sp. strain MJ3, isolated from Myulchi-Jeot, Korean fermented seafood.

Salimicrobium sp. strain MJ3 was isolated from myulchi-jeot, traditional fermented seafood made from anchovy in South Korea. Here we announce the draft genome sequence of Salimicrobium sp. MJ3 with 2,717,782 bp, which consists of 45 contigs (>500 bp in size), and provide a description of their annotation.

July 7, 2019

Next generation sequencing technologies and the changing landscape of phage genomics.

The dawn of next generation sequencing technologies has opened up exciting possibilities for whole genome sequencing of a plethora of organisms. The 2nd and 3rd generation sequencing technologies, based on cloning-free, massively parallel sequencing, have enabled the generation of a deluge of genomic sequences of both prokaryotic and eukaryotic origin in the last seven years. However, whole genome sequencing of bacterial viruses has not kept pace with this revolution, despite the fact that their genomes are orders of magnitude smaller in size compared with bacteria and other organisms. Sequencing phage genomes poses several challenges; (1) obtaining pure phage genomic material, (2) PCR amplification biases and (3) complex nature of their genetic material due to features such as methylated bases and repeats that are inherently difficult to sequence and assemble. Here we describe conclusions drawn from our efforts in sequencing hundreds of bacteriophage genomes from a variety of Gram-positive and Gram-negative bacteria using Sanger, 454, Illumina and PacBio technologies. Based on our experience we propose several general considerations regarding sample quality, the choice of technology and a “blended approach” for generating reliable whole genome sequences of phages.

July 7, 2019

The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

July 7, 2019

Next-generation sequencing and large genome assemblies.

The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

Auto Tag: Genome assembly

Multiple genome sequences of Helicobacter pylori strains of diverse disease and antibiotic resistance backgrounds from Malaysia.

Complete genome sequence of Carnobacterium gilichinskyi strain WN1359T (DSM 27470T).

Genome sequence of Phaeobacter inhibens type strain (T5(T)), a secondary metabolite producing representative of the marine Roseobacter clade, and emendation of the species description of Phaeobacter inhibens.

Comparing the genomes of Helicobacter pylori clinical strain UM032 and mice-adapted derivatives.

The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

Cerulean: A hybrid assembly using high throughput short and long reads

Finished bacterial genomes from shotgun sequence data.

A hybrid approach for the automated finishing of bacterial genomes.

Cancer genomics: technology, discovery, and translation.

Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011.

Complete genome sequence of Liberibacter crescens BT-1.

Draft genome sequence of Salimicrobium sp. strain MJ3, isolated from Myulchi-Jeot, Korean fermented seafood.

Next generation sequencing technologies and the changing landscape of phage genomics.

The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

Next-generation sequencing and large genome assemblies.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert