WGS Archives - Page 324 of 349

July 7, 2019

Complete genome sequence of Pseudomonas sp. strain NC02, isolated from soil.

We report here the complete genome sequence of Pseudomonas sp. strain NC02, isolated from soil in eastern Massachusetts. We assembled PacBio reads into a single closed contig with 132× mean coverage and then polished this contig using Illumina MiSeq reads, yielding a 6,890,566-bp sequence with 61.1% GC content. Copyright © 2018 Cerra et al.

July 7, 2019

Complete genome sequence of Escherichia coli ML35.

We report here the complete genome sequence of Escherichia coli strain ML35. We assembled PacBio reads into a single closed contig with 169× mean coverage and then polished this contig using Illumina MiSeq reads, yielding a 4,918,774-bp sequence with 50.8% GC content. Copyright © 2018 Casale et al.

July 7, 2019

De novo genome assembly of a Plasmodium falciparum NF54 clone using Single-Molecule Real-Time Sequencing.

Plasmodium falciparum is the species of human malaria parasite that causes the most severe form of the disease. Here, we used single-molecule real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio) to sequence, assemble de novo, and annotate the genome of a P. falciparum NF54 clone. Copyright © 2018 Bryant et al.

July 7, 2019

Complete genome sequence of Dietziasp. Strain WMMA184, a marine coral-associated bacterium.

Dietzia sp. strain WMMA184 was isolated from the marine coralMontastraea faveolataas part of ongoing drug discovery efforts. Analysis of the 4.16-Mb genome provides information regarding interspecies interactions as it pertains to the regulation of secondary metabolism and natural product biosynthesis potential. Copyright © 2018 Braun et al.

July 7, 2019

Complete genome sequence of Thermoanaerobacterium sp. strain RBIITD, a butyrate- and butanol-producing thermophile.

Thermoanaerobacterium sp. strain RBIITD was isolated from contaminated rich growth medium at 55°C in an anaerobic chamber. It primarily produces butyrate as a fermentation product from plant biomass-derived sugars. The whole-genome sequence of the strain is 3.4 Mbp, with 3,444 genes and 32.48% GC content.

July 7, 2019

Genome sequence of the necrotrophic plant pathogen Alternaria brassicicola Abra43.

Alternaria brassicicola causes dark spot (or black spot) disease, which is one of the most common and destructive fungal diseases of Brassicaceae spp. worldwide. Here, we report the draft genome sequence of strain Abra43. The assembly comprises 29 scaffolds, with an N50 value of 2.1 Mb. The assembled genome was 31,036,461 bp in length, with a G+C content of 50.85%.

July 7, 2019

Complete genome sequence and methylome analysis of Bacillus caldolyticus NEB414.

Bacillus caldolyticus NEB414 is the original source strain for the restriction enzyme BclI. Its complete sequence and full methylome were determined using single-molecule real-time sequencing. Copyright © 2018 Fomenkov et al.

July 7, 2019

Complete genome sequence of industrial dairy strain Streptococcus thermophilus DGCC 7710.

We report here the complete genome sequence of Streptococcus thermophilus DGCC 7710. S. thermophilus is widely used in industrial dairy production.

July 7, 2019

A comprehensive model of DNA fragmentation for the preservation of High Molecular Weight DNA

During DNA extraction the DNA molecule undergoes physical and chemical shearing, causing the DNA to fragment into shorter and shorter pieces. Under common laboratory conditions this fragmentation yields DNA fragments of 5-35 kilobases (kb) in length. This fragment length is more than sufficient for DNA sequencing using short-read technologies which generate reads 50-600 bp in length, but insufficient for long-read sequencing and linked reads where fragment lengths of more than 40 kb may be desirable. This study provides a theoretical framework for quality management to ensure access to high molecular weight DNA in samples. Shearing can be divided into physical and chemical shearing which generate different patterns of fragmentation. Exposure to physical shearing creates a characteristic fragment length where DNA fragments are cut in half by shear stress. This characteristic length can be measured using gel electrophoresis or instruments for DNA fragment analysis. Chemical shearing generates randomly distributed fragment lengths visible as a smear of DNA below the peak fragment length. By measuring the peak of DNA fragment length and the proportion of very short DNA fragments both sources of shearing can be measured using commonly used laboratory techniques, providing a suitable quantification of DNA integrity of DNA for sequencing with long-read technologies.

July 7, 2019

An empirical evaluation of error correction methods and tools for next generation sequencing data

esearch. However, data produced by NGS is affected by different errors such as substitutions, deletions or insertion. It is essential to differentiate between true biological variants and alterations occurred due to errors for accurate downstream analysis. Many types of methods and tools have been developed for NGS error correction. Some of these methods only correct substitutions errors whereas others correct multi types of data errors. In this article, a comprehensive evaluation of three types of methods (k-spectrum based, Multi- sequencing alignment and Hybrid based) is presented which are implemented and adopted by different tools. Experiments have been conducted to compare the performance based on runtime and error correction rate. Two different computing platforms have been used for the experiments to evaluate effectiveness of runtime and error correction rate. The mission and aim of this comparative evaluation is to provide recommendations for selection of suitable tools to cope with the specific needs of users and practitioners. It has been noticed that k-mer spectrum based methodology generated superior results as compared to other methods. Amongst all the tools being utilized, Racer has shown eminent performance in terms of error correction rate and execution time for both small as well as large data sets. In multisequence alignment based tools, Karect depicts excellent error correction rate whereas Coral shows better execution time for all data sets. In hybrid based tools, Jabba shows better error correction rate and execution time as compared to brownie. Computing platforms mostly affect execution time but have no general effect on error correction rate.

July 7, 2019

Ten steps to get started in Genome Assembly and Annotation.

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).

July 7, 2019

FMLRC: Hybrid long read error correction using an FM-index.

Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy.We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods.Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

July 7, 2019

GenomeLandscaper: Landscape analysis of genome-fingerprints maps assessing chromosome architecture.

Assessing correctness of an assembled chromosome architecture is a central challenge. We create a geometric analysis method (called GenomeLandscaper) to conduct landscape analysis of genome-fingerprints maps (GFM), trace large-scale repetitive regions, and assess their impacts on the global architectures of assembled chromosomes. We develop an alignment-free method for phylogenetics analysis. The human Y chromosomes (GRCh.chrY, HuRef.chrY and YH.chrY) are analysed as a proof-of-concept study. We construct a galaxy of genome-fingerprints maps (GGFM) for them, and a landscape compatibility among relatives is observed. But a long sharp straight line on the GGFM breaks such a landscape compatibility, distinguishing GRCh38p1.chrY (and throughout GRCh38p7.chrY) from GRCh37p13.chrY, HuRef.chrY and YH.chrY. We delete a 1.30-Mbp target segment to rescue the landscape compatibility, matching the antecedent GRCh37p13.chrY. We re-locate it into the modelled centromeric and pericentromeric region of GRCh38p10.chrY, matching a gap placeholder of GRCh37p13.chrY. We decompose it into sub-constituents (such as BACs, interspersed repeats, and tandem repeats) and trace their homologues by phylogenetics analysis. We elucidate that most examined tandem repeats are of reasonable quality, but the BAC-sized repeats, 173U1020C (176.46 Kbp) and 5U41068C (205.34 Kbp), are likely over-repeated. These results offer unique insights into the centromeric and pericentromeric regions of the human Y chromosomes.

July 7, 2019

The odyssey of the ancestral Escherich strain through culture collections: an example of allopatric diversification.

More than a century ago, Theodor Escherich isolated the bacterium that was to become Escherichia coli, one of the most studied organisms. Not long after, the strain began an odyssey and landed in many laboratories across the world. As laboratory culture conditions could be responsible for major changes in bacterial strains, we conducted a genome analysis of isolates of this emblematic strain from different culture collections (England, France, the United States, Germany). Strikingly, many discrepancies between the isolates were observed, as revealed by multilocus sequence typing (MLST), the presence of virulence-associated genes, core genome MLST, and single nucleotide polymorphism/indel analyses. These differences are correlated with the phylogeographic history of the strain and were due to an unprecedented number of mutations in coding DNA repair functions such as mismatch repair (MutL) and oxidized guanine nucleotide pool cleaning (MutT), conferring a specific mutational spectrum and leading to a mutator phenotype. The mutator phenotype was probably acquired during subculturing and corresponded to second-order selection. Furthermore, all of the isolates exhibited hypersusceptibility to antibiotics due to mutations in efflux pump- and porin-encoding genes, as well as a specific mutation in the sigma factor-encoding generpoS. These defects reflect a self-preservation and nutritional competence tradeoff allowing survival under the starvation conditions imposed by storage. From a clinical point of view, dealing with such mutator strains can lead microbiologists to draw false conclusions about isolate relatedness and may impact therapeutic effectiveness. IMPORTANCE Mutator phenotypes have been described in laboratory-evolved bacteria, as well as in natural isolates. Several genes can be impacted, each of them being associated with a typical mutational spectrum. By studying one of the oldest strains available, the ancestral Escherich strain, we were able to identify its mutator status leading to tremendous genetic diversity among the isolates from various collections and allowing us to reconstruct the phylogeographic history of the strain. This mutator phenotype was probably acquired during the storage of the strain, promoting adaptation to a specific environment. Other mutations inrpoSand efflux pump- and porin-encoding genes highlight the acclimatization of the strain through self-preservation and nutritional competence regulation. This strain history can be viewed as unintentional experimental evolution in culture collections all over the word since 1885, mimicking the long-term experimental evolution ofE. coliof Lenski et al. (O. Tenaillon, J. E. Barrick, N. Ribeck, D. E. Deatherage, J. L. Blanchard, A. Dasgupta, G. C. Wu, S. Wielgoss, S. Cruveiller, C. Médigue, D. Schneider, and R. E. Lenski, Nature 536:165-170, 2016, https://doi.org/10.1038/nature18959) that shares numerous molecular features.

July 7, 2019

Identification and expression analysis of wheat TaGF14 genes.

The 14-3-3 gene family members play key roles in various cellular processes. However, little is known about the numbers and roles of 14-3-3 genes in wheat. The aims of this study were to identify TaGF14 numbers in wheat by searching its whole genome through blast, to study the phylogenetic relationships with other plant species and to discuss the functions of TaGF14s. The results showed that common wheat harbored 20 TaGF14 genes, located on wheat chromosome groups 2, 3, 4, and 7. Out of them, eighteen TaGF14s are non-e proteins, and two wheat TaGF14 genes, TaGF14i and TaGF14f, are e proteins. Phylogenetic analysis indicated that these genes were divided into six clusters: cluster 1 (TaGF14d, TaGF14g, TaGF14j, TaGF14h, TaGF14c, and TaGF14n); cluster 2 (TaGF14k); cluster 3 (TaGF14b, TaGF14l, TaGF14m, and TaGF14s); cluster 4 (TaGF14a, TaGF14e, and TaGF14r); cluster 5 (TaGF14i and TaGF14f); and cluster 6 (TaGF14o, TaGF14p, TaGF14q, and TaGF14t). Tissue-specific gene expressions suggested that all TaGF14s were likely constitutively expressed, except two genes, i.e., TaGF14p and TaGF14f. And the highest amount of TaGF14 transcripts were observed in developing grains at 20 days post anthesis (DPA), especially for TaGF14j and TaGF14l. After drought stress, five genes, i.e., TaGF14c, TaGF14d, TaGF14g, TaGF14h, and TaGF14j, were up-regulated expression under drought stress for both 1 and 6 h, suggesting these genes played vital role in combating against drought stress. However, all the TaGF14s were down-regulated expression under heat stress for both 1 and 6 h, indicating TaGF14s may be negatively associated with heat stress by reducing the expression to combat heat stress or through other pathways. These results suggested that cluster 1, e.g., TaGF14j, may participate in the whole wheat developing stages, e.g., grain-filling (starch biosynthesis) and may also participate in combating against drought stress. Subsequently, a homolog of TaGF14j, TaGF14-JM22, were cloned by RACE and used to validate its function. Immunoblotting results showed that TaGF14-JM22 protein, closely related to TaGF14d, TaGF14g, and TaGF14j, can interact with AGP-L, SSI, SSII, SBEIIa, and SBEIIb in developing grains, suggesting that TaGF14s located on group 4 may be involved in starch biosynthesis. Therefore, it is possible to develop starch-rich wheat cultivars by modifying TaGF14s.

Asset Tag: WGS

Complete genome sequence of Pseudomonas sp. strain NC02, isolated from soil.

Complete genome sequence of Escherichia coli ML35.

De novo genome assembly of a Plasmodium falciparum NF54 clone using Single-Molecule Real-Time Sequencing.

Complete genome sequence of Dietziasp. Strain WMMA184, a marine coral-associated bacterium.

Complete genome sequence of Thermoanaerobacterium sp. strain RBIITD, a butyrate- and butanol-producing thermophile.

Genome sequence of the necrotrophic plant pathogen Alternaria brassicicola Abra43.

Complete genome sequence and methylome analysis of Bacillus caldolyticus NEB414.

Complete genome sequence of industrial dairy strain Streptococcus thermophilus DGCC 7710.

A comprehensive model of DNA fragmentation for the preservation of High Molecular Weight DNA

An empirical evaluation of error correction methods and tools for next generation sequencing data

Ten steps to get started in Genome Assembly and Annotation.

FMLRC: Hybrid long read error correction using an FM-index.

GenomeLandscaper: Landscape analysis of genome-fingerprints maps assessing chromosome architecture.

The odyssey of the ancestral Escherich strain through culture collections: an example of allopatric diversification.

Identification and expression analysis of wheat TaGF14 genes.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert