Genome assembly Archives - Page 162 of 196

July 7, 2019

An improved genome assembly of Azadirachta indica A. Juss.

Neem (Azadirachta indica A. Juss.), an evergreen tree of the Meliaceae family, is known for its medicinal, cosmetic, pesticidal and insecticidal properties. We had previously sequenced and published the draft genome of the plant, using mainly short read sequencing data. In this report, we present an improved genome assembly generated using additional short reads from Illumina and long reads from Pacific Biosciences SMRT sequencer. We assembled short reads and error corrected long reads using Platanus, an assembler designed to perform well for heterozygous genomes. The updated genome assembly (v2.0) yielded 3- and 3.5-fold increase in N50 and N75, respectively; 2.6-fold decrease in the total number of scaffolds; 1.25-fold increase in the number of valid transcriptome alignments; 13.4-fold less mis-assembly and 1.85-fold increase in the percentage repeat, over the earlier assembly (v1.0). The current assembly also maps better to the genes known to be involved in the terpenoid biosynthesis pathway. Together, the data represents an improved assembly of the A. indica genome. The raw data described in this manuscript are submitted to the NCBI Short Read Archive under the accession numbers SRX1074131, SRX1074132, SRX1074133, and SRX1074134 (SRP013453). Copyright © 2016 Author et al.

July 7, 2019

Complete chloroplast genome sequences of Eucommia ulmoides: genome structure and evolution.

Eucommia ulmoides is an important traditional medicinal plant that is used for the production of locative Eucommia rubber. In this study, the complete chloroplast (cp) genome sequence of E. ulmoides was obtained by total DNA sequencing; this is the first cp genome sequence of the order Garryales. The cp genome of E. ulmoides was 163,341 bp long and included a pair of inverted repeat (IR) regions (31,300 bp), one large single copy (LSC) region (86,592 bp), and one small single copy (SSC) region (14,149 bp). The genome structure and GC content were similar to those of typical angiosperm cp genomes and contained 115 unique genes, including 80 protein-coding genes, 31 transfer RNA (tRNAs), and four ribosomal RNA (rRNAs). Compared with the entire cp genome sequence, three unique genome rearrangements were observed in the LSC region. Moreover, compared with the Sesamum and Nicotiana cp genomes, E. ulmoides contained no indels in the IR regions, and variable regions were identified in noncoding regions. The E. ulmoides cp genome showed extreme expansion at the IR/SSC boundary owing to the integration of an additional complete gene, ycf1. Twenty-nine simple sequence repeats (SSRs) were identified in the E. ulmoides cp genome. In addition, 36 protein-coding genes were used for phylogenetic inference, supporting a sister relationship between E. ulmoides and Aucuba, which belongs to Euasterids I. In summary, we described the complete cp genome sequence of E. ulmoides; this information will be useful for phylogenetic and evolutionary studies.

July 7, 2019

Isolation and complete genome sequence of the thermophilic Geobacillus sp. 12AMOR1 from an Arctic deep-sea hydrothermal vent site.

Members of the genus Geobacillus have been isolated from a wide variety of habitats worldwide and are the subject for targeted enzyme utilization in various industrial applications. Here we report the isolation and complete genome sequence of the thermophilic starch-degrading Geobacillus sp. 12AMOR1. The strain 12AMOR1 was isolated from deep-sea hot sediment at the Jan Mayen hydrothermal Vent Site. Geobacillus sp. 12AMOR1 consists of a 3,410,035 bp circular chromosome and a 32,689 bp plasmid with a G?+?C content of 52 % and 47 %, respectively. The genome comprises 3323 protein-coding genes, 88 tRNA species and 10 rRNA operons. The isolate grows on a suite of sugars, complex polysaccharides and proteinous carbon sources. Accordingly, a versatility of genes encoding carbohydrate-active enzymes (CAZy) and peptidases were identified in the genome. Expression, purification and characterization of an enzyme of the glycoside hydrolase family 13 revealed a starch-degrading capacity and high thermal stability with a melting temperature of 76.4 °C. Altogether, the data obtained point to a new isolate from a marine hydrothermal vent with a large bioprospecting potential.

July 7, 2019

PEPR: pipelines for evaluating prokaryotic references.

The rapid adoption of microbial whole genome sequencing in public health, clinical testing, and forensic laboratories requires the use of validated measurement processes. Well-characterized, homogeneous, and stable microbial genomic reference materials can be used to evaluate measurement processes, improving confidence in microbial whole genome sequencing results. We have developed a reproducible and transparent bioinformatics tool, PEPR, Pipelines for Evaluating Prokaryotic References, for characterizing the reference genome of prokaryotic genomic materials. PEPR evaluates the quality, purity, and homogeneity of the reference material genome, and purity of the genomic material. The quality of the genome is evaluated using high coverage paired-end sequence data; coverage, paired-end read size and direction, as well as soft-clipping rates, are used to identify mis-assemblies. The homogeneity and purity of the material relative to the reference genome are characterized by comparing base calls from replicate datasets generated using multiple sequencing technologies. Genomic purity of the material is assessed by checking for DNA contaminants. We demonstrate the tool and its output using sequencing data while developing a Staphylococcus aureus candidate genomic reference material. PEPR is open source and available at https://github.com/usnistgov/pepr .

July 7, 2019

The kiwifruit genome

The whole-genome sequence of Actinidia chinensis var. chinensis ‘Hongyang’ was published in 2013 and was represented as the first publicly available Ericales genome sequence. Publication in 2015 of an improved linkage map for A. chinensis and interspecific comparison analyses coupled with the availability of a second whole-genome sequence of a genotype closely related to ‘Hongyang’ have enabled the kiwifruit research community to improve the existing whole-genome sequence. This chapter describes the original genome sequence and steps towards its improvement.

July 7, 2019

Genome sequence of Ustilaginoidea virens IPU010, a rice pathogenic fungus causing false smut.

Ustilaginoidea virens is a rice pathogenic fungus that causes false smut disease, a disease that seriously damages the yield and quality of the grain. Analysis of the U. virens IPU010 33.6-Mb genome sequence will aid in the understanding of the pathogenicity of the strain, particularly in regard to effector proteins and secondary metabolic genes. Copyright © 2016 Kumagai et al.

July 7, 2019

Genomic organization of the zebrafish (Danio rerio) T cell receptor alpha/delta locus and analysis of expressed products.

In testing the hypothesis that all jawed vertebrate classes employ immunoglobulin heavy chain V (IgHV) gene segments in their T cell receptor (TCR)d encoding loci, we found that some basic characterization was required of zebrafish TCRd. We began by annotating and characterizing the TCRa/d locus of Danio rerio based on the most recent genome assembly, GRCz10. We identified a total of 141 theoretically functional V segments which we grouped into 41 families based upon 70 % nucleotide identity. This number represents the second greatest count of apparently functional V genes thus far described in an antigen receptor locus with the exception of cattle TCRa/d. Cloning, relative quantitative PCR, and deep sequencing results corroborate that zebrafish do express TCRd, but these data suggest only at extremely low levels and in limited diversity in the spleens of the adult fish. While we found no evidence for IgH-TCRd rearrangements in this fish, by determining the locus organization we were able to suggest how the evolution of the teleost a/d locus could have lost IgHVs that exist in sharks and frogs. We also found evidence of surprisingly low TCRd expression and repertoire diversity in this species.

July 7, 2019

Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

July 7, 2019

Complete genome sequence of a CTX-M-15-producing Escherichia coli strain from the H30Rx subclone of sequence type 131 from a patient with recurrent urinary tract infections, closely related to a lethal urosepsis isolate from the patient’s sister.

We report here the complete genome sequence, including five plasmid sequences, of Escherichia coli sequence type 131 (ST131) strain JJ1887. The strain was isolated in 2007 in the United States from a patient with recurrent cystitis, whose caregiver sister died from urosepsis caused by a nearly identical strain. Copyright © 2016 Johnson et al.

July 7, 2019

Complete mitochondrial genome sequence of the pezizomycete Pyronema confluens.

The complete mitochondrial genome of the ascomycete Pyronema confluens has been sequenced. The circular genome has a size of 191 kb and contains 48 protein-coding genes, 26 tRNA genes, and two rRNA genes. Of the protein-coding genes, 14 encode conserved mitochondrial proteins, and 31 encode predicted homing endonuclease genes. Copyright © 2016 Nowrousian.

July 7, 2019

OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies.

July 7, 2019

Genetic diversity of O-antigens in Hafnia alvei and the development of a suspension array for serotype detection.

Hafnia alvei is a facultative and rod-shaped gram-negative bacterium that belongs to the Enterobacteriaceae family. Although it has been more than 50 years since the genus was identified, very little is known about variations among Hafnia species. Diversity in O-antigens (O-polysaccharide, OPS) is thought to be a major factor in bacterial adaptation to different hosts and situations and variability in the environment. Antigenic variation is also an important factor in pathogenicity that has been used to define clones within a number of species. The genes that are required to synthesize OPS are always clustered within the bacterial chromosome. A serotyping scheme including 39 O-serotypes has been proposed for H. alvei, but it has not been correlated with known OPS structures, and no previous report has described the genetic features of OPS. In this study, we obtained the genome sequences of 21 H. alvei strains (as defined by previous immunochemical studies) with different lipopolysaccharides. This is the first study to show that the O-antigen gene cluster in H. alvei is located between mpo and gnd in the chromosome. All 21 of the OPS gene clusters contain both the wzx gene and the wzy gene and display a large number of polymorphisms. We developed an O serotype-specific wzy-based suspension array to detect all 21 of the distinct OPS forms we identified in H. alvei. To the best of our knowledge, this is the first report to identify the genetic features of H. alvei antigenic variation and to develop a molecular technique to identify and classify different serotypes.

July 7, 2019

Emergence of host-adapted Salmonella Enteritidis through rapid evolution in an immunocompromised host.

Host adaptation is a key factor contributing to the emergence of new bacterial, viral and parasitic pathogens. Many pathogens are considered promiscuous because they cause disease across a range of host species, while others are host-adapted, infecting particular hosts(1). Host adaptation can potentially progress to host restriction where the pathogen is strictly limited to a single host species and is frequently associated with more severe symptoms. Host-adapted and host-restricted bacterial clades evolve from within a broader host-promiscuous species and sometimes target different niches within their specialist hosts, such as adapting from a mucosal to a systemic lifestyle. Genome degradation, marked by gene inactivation and deletion, is a key feature of host adaptation, although the triggers initiating genome degradation are not well understood. Here, we show that a chronic systemic non-typhoidal Salmonella infection in an immunocompromised human patient resulted in genome degradation targeting genes that are expendable for a systemic lifestyle. We present a genome-based investigation of a recurrent blood-borne Salmonella enterica serotype Enteritidis (S. Enteritidis) infection covering 15 years in an interleukin (IL)-12 ß-1 receptor-deficient individual that developed into an asymptomatic chronic infection. The infecting S. Enteritidis harbored a mutation in the mismatch repair gene mutS that accelerated the genomic mutation rate. Phylogenetic analysis and phenotyping of multiple patient isolates provides evidence for a remarkable level of within-host evolution that parallels genome changes present in successful host-restricted bacterial pathogens but never before observed on this timescale. Our analysis identifies common pathways of host adaptation and demonstrates the role that immunocompromised individuals can play in this process.

July 7, 2019

Antibiotic failure mediated by a resistant subpopulation in Enterobacter cloacae.

Antibiotic resistance is a major public health threat, further complicated by unexplained treatment failures caused by bacteria that appear antibiotic susceptible. We describe an Enterobacter cloacae isolate harbouring a minor subpopulation that is highly resistant to the last-line antibiotic colistin. This subpopulation was distinct from persisters, became predominant in colistin, returned to baseline after colistin removal and was dependent on the histidine kinase PhoQ. During murine infection, but in the absence of colistin, innate immune defences led to an increased frequency of the resistant subpopulation, leading to inefficacy of subsequent colistin therapy. An isolate with a lower-frequency colistin-resistant subpopulation similarly caused treatment failure but was misclassified as susceptible by current diagnostics once cultured outside the host. These data demonstrate the ability of low-frequency bacterial subpopulations to contribute to clinically relevant antibiotic resistance, elucidating an enigmatic cause of antibiotic treatment failure and highlighting the critical need for more sensitive diagnostics.

July 7, 2019

Assembly of long error-prone reads using de Bruijn graphs.

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.

Auto Tag: Genome assembly

An improved genome assembly of Azadirachta indica A. Juss.

Complete chloroplast genome sequences of Eucommia ulmoides: genome structure and evolution.

Isolation and complete genome sequence of the thermophilic Geobacillus sp. 12AMOR1 from an Arctic deep-sea hydrothermal vent site.

PEPR: pipelines for evaluating prokaryotic references.

The kiwifruit genome

Genome sequence of Ustilaginoidea virens IPU010, a rice pathogenic fungus causing false smut.

Genomic organization of the zebrafish (Danio rerio) T cell receptor alpha/delta locus and analysis of expressed products.

Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

Complete genome sequence of a CTX-M-15-producing Escherichia coli strain from the H30Rx subclone of sequence type 131 from a patient with recurrent urinary tract infections, closely related to a lethal urosepsis isolate from the patient’s sister.

Complete mitochondrial genome sequence of the pezizomycete Pyronema confluens.

OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

Genetic diversity of O-antigens in Hafnia alvei and the development of a suspension array for serotype detection.

Emergence of host-adapted Salmonella Enteritidis through rapid evolution in an immunocompromised host.

Antibiotic failure mediated by a resistant subpopulation in Enterobacter cloacae.

Assembly of long error-prone reads using de Bruijn graphs.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert