Gap filling Archives - Page 12 of 19

July 7, 2019

The genome sequence of Streptomyces lividans 66 reveals a novel tRNA-dependent peptide biosynthetic system within a metal-related genomic island.

The complete genome sequence of the original isolate of the model actinomycete Streptomyces lividans 66, also referred to as 1326, was deciphered after a combination of next-generation sequencing platforms and a hybrid assembly pipeline. Comparative analysis of the genomes of S. lividans 66 and closely related strains, including S. coelicolor M145 and S. lividans TK24, was used to identify strain-specific genes. The genetic diversity identified included a large genomic island with a mosaic structure, present in S. lividans 66 but not in the strain TK24. Sequence analyses showed that this genomic island has an anomalous (G + C) content, suggesting recent acquisition and that it is rich in metal-related genes. Sequences previously linked to a mobile conjugative element, termed plasmid SLP3 and defined here as a 94 kb region, could also be identified within this locus. Transcriptional analysis of the response of S. lividans 66 to copper was used to corroborate a role of this large genomic island, including two SLP3-borne “cryptic” peptide biosynthetic gene clusters, in metal homeostasis. Notably, one of these predicted biosynthetic systems includes an unprecedented nonribosomal peptide synthetase–tRNA-dependent transferase biosynthetic hybrid organization. This observation implies the recruitment of members of the leucyl/phenylalanyl-tRNA-protein transferase family to catalyze peptide bond formation within the biosynthesis of natural products. Thus, the genome sequence of S. lividans 66 not only explains long-standing genetic and phenotypic differences but also opens the door for further in-depth comparative genomic analyses of model Streptomyces strains, as well as for the discovery of novel natural products following genome-mining approaches.

July 7, 2019

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

July 7, 2019

Genome sequence of Phaeobacter daeponensis type strain (DSM 23529(T)), a facultatively anaerobic bacterium isolated from marine sediment, and emendation of Phaeobacter daeponensis.

TF-218(T) is the type strain of the species Phaeobacter daeponensis Yoon et al. 2007, a facultatively anaerobic Phaeobacter species isolated from tidal flats. Here we describe the draft genome sequence and annotation of this bacterium together with previously unreported aspects of its phenotype. We analyzed the genome for genes involved in secondary metabolite production and its anaerobic lifestyle, which have also been described for its closest relative Phaeobacter caeruleus. The 4,642,596 bp long genome of strain TF-218(T) contains 4,310 protein-coding genes and 78 RNA genes including four rRNA operons and consists of five replicons: one chromosome and four extrachromosomal elements with sizes of 276 kb, 174 kb, 117 kb and 90 kb. Genome analysis showed that TF-218(T) possesses all of the genes for indigoidine biosynthesis, and on specific media the strain showed a blue pigmentation. We also found genes for dissimilatory nitrate reduction, gene-transfer agents, NRPS/ PKS genes and signaling systems homologous to the LuxR/I system.

July 7, 2019

Genome sequence of Phaeobacter inhibens type strain (T5(T)), a secondary metabolite producing representative of the marine Roseobacter clade, and emendation of the species description of Phaeobacter inhibens.

Strain T5(T) is the type strain of the species Phaeobacter inhibens Martens et al. 2006, a secondary metabolite producing bacterium affiliated to the Roseobacter clade. Strain T5(T) was isolated from a water sample taken at the German Wadden Sea, southern North Sea. Here we describe the complete genome sequence and annotation of this bacterium with a special focus on the secondary metabolism and compare it with the genomes of the Phaeobacter inhibens strains DSM 17395 and DSM 24588 (2.10), selected because of the close phylogenetic relationship based on the 16S rRNA gene sequences of these three strains. The genome of strain T5(T) comprises 4,130,897 bp with 3.923 protein-coding genes and shows high similarities in genetic and genomic characteristics compared to P. inhibens DSM 17395 and DSM 24588 (2.10). Besides the chromosome, strain T5(T) possesses four plasmids, three of which show a high similarity to the plasmids of the strains DSM 17395 and DSM 24588 (2.10). Analysis of the fourth plasmid suggested horizontal gene transfer. Most of the genes on this plasmid are not present in the strains DSM 17395 and DSM 24588 (2.10) including a nitrous oxide reductase, which allows strain T5(T) a facultative anaerobic lifestyle. The G+C content was calculated from the genome sequence and differs significantly from the previously published value, thus warranting an emendation of the species description.

July 7, 2019

Cerulean: A hybrid assembly using high throughput short and long reads

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.

July 7, 2019

Genome sequence of “Candidatus Microthrix parvicella” Bio17-1, a long-chain-fatty-acid-accumulating filamentous actinobacterium from a biological wastewater treatment plant.

Candidatus Microthrix bacteria are deeply branching filamentous actinobacteria which occur at the water-air interface of biological wastewater treatment plants, where they are often responsible for foaming and bulking. Here, we report the first draft genome sequence of a strain from this genus: “Candidatus Microthrix parvicella” strain Bio17-1.

July 7, 2019

A hybrid approach for the automated finishing of bacterial genomes.

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.

July 7, 2019

Improving genome assemblies by sequencing PCR products with PacBio.

Advances in sequencing technologies have dramatically reduced costs in producing high-quality draft genomes. However, there are still many contigs and possible misassembled regions in those draft genomes. Improving the quality of these genomes requires an efficient and economical means to close gaps and resequence some regions. Sequencing pooled gap region PCR products with Pacific Biosciences (PacBio) provides a significantly less expensive means for this need. We have developed a genome improvement pipeline with this strategy after decreasing a loading bias against larger PCR products in the PacBio process. Compared with Sanger technology, this approach is not only cost-effective but also can close gaps greater than 2.5 kb in a single round of reactions, and sequence through high GC regions as well as difficult secondary structures such as small hairpin loops.

July 7, 2019

Next generation sequencing technologies and the changing landscape of phage genomics.

The dawn of next generation sequencing technologies has opened up exciting possibilities for whole genome sequencing of a plethora of organisms. The 2nd and 3rd generation sequencing technologies, based on cloning-free, massively parallel sequencing, have enabled the generation of a deluge of genomic sequences of both prokaryotic and eukaryotic origin in the last seven years. However, whole genome sequencing of bacterial viruses has not kept pace with this revolution, despite the fact that their genomes are orders of magnitude smaller in size compared with bacteria and other organisms. Sequencing phage genomes poses several challenges; (1) obtaining pure phage genomic material, (2) PCR amplification biases and (3) complex nature of their genetic material due to features such as methylated bases and repeats that are inherently difficult to sequence and assemble. Here we describe conclusions drawn from our efforts in sequencing hundreds of bacteriophage genomes from a variety of Gram-positive and Gram-negative bacteria using Sanger, 454, Illumina and PacBio technologies. Based on our experience we propose several general considerations regarding sample quality, the choice of technology and a “blended approach” for generating reliable whole genome sequences of phages.

July 7, 2019

Strategies for complete plastid genome sequencing.

Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.© 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

July 7, 2019

Wild tobacco genomes reveal the evolution of nicotine biosynthesis.

Nicotine, the signature alkaloid of Nicotiana species responsible for the addictive properties of human tobacco smoking, functions as a defensive neurotoxin against attacking herbivores. However, the evolution of the genetic features that contributed to the assembly of the nicotine biosynthetic pathway remains unknown. We sequenced and assembled genomes of two wild tobaccos, Nicotiana attenuata (2.5 Gb) and Nicotiana obtusifolia (1.5 Gb), two ecological models for investigating adaptive traits in nature. We show that after the Solanaceae whole-genome triplication event, a repertoire of rapidly expanding transposable elements (TEs) bloated these Nicotiana genomes, promoted expression divergences among duplicated genes, and contributed to the evolution of herbivory-induced signaling and defenses, including nicotine biosynthesis. The biosynthetic machinery that allows for nicotine synthesis in the roots evolved from the stepwise duplications of two ancient primary metabolic pathways: the polyamine and nicotinamide adenine dinucleotide (NAD) pathways. In contrast to the duplication of the polyamine pathway that is shared among several solanaceous genera producing polyamine-derived tropane alkaloids, we found that lineage-specific duplications within the NAD pathway and the evolution of root-specific expression of the duplicated Solanaceae-specific ethylene response factor that activates the expression of all nicotine biosynthetic genes resulted in the innovative and efficient production of nicotine in the genus Nicotiana Transcription factor binding motifs derived from TEs may have contributed to the coexpression of nicotine biosynthetic pathway genes and coordinated the metabolic flux. Together, these results provide evidence that TEs and gene duplications facilitated the emergence of a key metabolic innovation relevant to plant fitness.

July 7, 2019

Genome of the pitcher plant Cephalotus reveals genetic changes associated with carnivory

Carnivorous plants exploit animals as a nutritional source and have inspired long-standing questions about the origin and evolution of carnivory-related traits. To investigate the molecular bases of carnivory, we sequenced the genome of the heterophyllous pitcher plant Cephulotus folliculnris, in which we succeeded in regulating the developmental switch between carnivorous and non-carnivorous leaves. Transcriptome comparison of the two leaf types and gene repertoire analysis identi- fied genetic changes associated with prey attraction, capture, digestion and nutrient absorption. Analysis of digestive fluid proteins from C. folliculnris and three other carnivorous plants with independent carnivorous origins revealed repeated co-options of stress-responsive protein lineages coupled with convergent amino acid substitutions to acquire digestive physiology. These results imply constraints on the available routes to evolve plant carnivory.

July 7, 2019

Complex modular architecture around a simple toolkit of wing pattern genes

Identifying the genomic changes that control morphological variation and understanding how they generate diversity is a major goal of evolutionary biology. In Heliconius butterflies, a small number of genes control the development of diverse wing colour patterns. Here, we used full-genome sequencing of individuals across the Heliconius erato radiation and closely related species to characterize genomic variation associated with wing pattern diversity. We show that variation around colour pattern genes is highly modular, with narrow genomic intervals associated with specific differences in colour and pattern. This modular architecture explains the diversity of colour patterns and provides a flexible mechanism for rapid morphological diversification.

July 7, 2019

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Haplotypes A and B of ‘Candidatus Liberibacter solanacearum’ (CLso) are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp) and FIN111 (1.20 Mbp), were obtained from carrot psyllids (Trioza apicalis) harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.

July 7, 2019

Genome scaffolding and annotation for the pathogen vector Ixodes ricinus by ultra-long single molecule sequencing.

Global warming and other ecological changes have facilitated the expansion of Ixodes ricinus tick populations. Ixodes ricinus is the most important carrier of vector-borne pathogens in Europe, transmitting viruses, protozoa and bacteria, in particular Borrelia burgdorferi (sensu lato), the causative agent of Lyme borreliosis, the most prevalent vector-borne disease in humans in the Northern hemisphere. To faster control this disease vector, a better understanding of the I. ricinus tick is necessary. To facilitate such studies, we recently published the first reference genome of this highly prevalent pathogen vector. Here, we further extend these studies by scaffolding and annotating the first reference genome by using ultra-long sequencing reads from third generation single molecule sequencing. In addition, we present the first genome size estimation for I. ricinus ticks and the embryo-derived cell line IRE/CTVM19.235,953 contigs were integrated into 204,904 scaffolds, extending the currently known genome lengths by more than 30% from 393 to 516 Mb and the N50 contig value by 87% from 1643 bp to a N50 scaffold value of 3067 bp. In addition, 25,263 sequences were annotated by comparison to the tick’s North American relative Ixodes scapularis. After (conserved) hypothetical proteins, zinc finger proteins, secreted proteins and P450 coding proteins were the most prevalent protein categories annotated. Interestingly, more than 50% of the amino acid sequences matching the homology threshold had 95-100% identity to the corresponding I. scapularis gene models. The sequence information was complemented by the first genome size estimation for this species. Flow cytometry-based genome size analysis revealed a haploid genome size of 2.65Gb for I. ricinus ticks and 3.80 Gb for the cell line.We present a first draft sequence map of the I. ricinus genome based on a PacBio-Illumina assembly. The I. ricinus genome was shown to be 26% (500 Mb) larger than the genome of its American relative I. scapularis. Based on the genome size of 2.65 Gb we estimated that we covered about 67% of the non-repetitive sequences. Genome annotation will facilitate screening for specific molecular pathways in I. ricinus cells and provides an overview of characteristics and functions.

Auto Tag: Gap filling

The genome sequence of Streptomyces lividans 66 reveals a novel tRNA-dependent peptide biosynthetic system within a metal-related genomic island.

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

Genome sequence of Phaeobacter daeponensis type strain (DSM 23529(T)), a facultatively anaerobic bacterium isolated from marine sediment, and emendation of Phaeobacter daeponensis.

Genome sequence of Phaeobacter inhibens type strain (T5(T)), a secondary metabolite producing representative of the marine Roseobacter clade, and emendation of the species description of Phaeobacter inhibens.

Cerulean: A hybrid assembly using high throughput short and long reads

Genome sequence of “Candidatus Microthrix parvicella” Bio17-1, a long-chain-fatty-acid-accumulating filamentous actinobacterium from a biological wastewater treatment plant.

A hybrid approach for the automated finishing of bacterial genomes.

Improving genome assemblies by sequencing PCR products with PacBio.

Next generation sequencing technologies and the changing landscape of phage genomics.

Strategies for complete plastid genome sequencing.

Wild tobacco genomes reveal the evolution of nicotine biosynthesis.

Genome of the pitcher plant Cephalotus reveals genetic changes associated with carnivory

Complex modular architecture around a simple toolkit of wing pattern genes

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Genome scaffolding and annotation for the pathogen vector Ixodes ricinus by ultra-long single molecule sequencing.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert