Gap filling Archives - Page 9 of 19

July 7, 2019

The draft genome of Primula veris yields insights into the molecular basis of heterostyly.

The flowering plant Primula veris is a common spring blooming perennial that is widely cultivated throughout Europe. This species is an established model system in the study of the genetics, evolution, and ecology of heterostylous floral polymorphisms. Despite the long history of research focused on this and related species, the continued development of this system has been restricted due the absence of genomic and transcriptomic resources.We present here a de novo draft genome assembly of P. veris covering 301.8 Mb, or approximately 63% of the estimated 479.22 Mb genome, with an N50 contig size of 9.5 Kb, an N50 scaffold size of 164 Kb, and containing an estimated 19,507 genes. The results of a RADseq bulk segregant analysis allow for the confident identification of four genome scaffolds that are linked to the P. veris S-locus. RNAseq data from both P. veris and the closely related species P. vulgaris allow for the characterization of 113 candidate heterostyly genes that show significant floral morph-specific differential expression. One candidate gene of particular interest is a duplicated GLOBOSA homolog that may be unique to Primula (PveGLO2), and is completely silenced in L-morph flowers.The P. veris genome represents the first genome assembled from a heterostylous species, and thus provides an immensely important resource for future studies focused on the evolution and genetic dissection of heterostyly. As the first genome assembled from the Primulaceae, the P. veris genome will also facilitate the expanded application of phylogenomic methods in this diverse family and the eudicots as a whole.

July 7, 2019

Complete genome sequence of Bacillus pumilus strain WP8, an efficient plant growth-promoting rhizobacterium.

Bacillus pumilus strain WP8 is an efficient plant growth-promoting rhizobacterium. Here, we present the complete genome of WP8 and its genes involved in plant growth promotion and biocontrol. Copyright © 2015 Kang et al.

July 7, 2019

A draft genome of field pennycress (Thlaspi arvense) provides tools for the domestication of a new winter biofuel crop.

Field pennycress (Thlaspi arvense L.) is being domesticated as a new winter cover crop and biofuel species for the Midwestern United States that can be double-cropped between corn and soybeans. A genome sequence will enable the use of new technologies to make improvements in pennycress. To generate a draft genome, a hybrid sequencing approach was used to generate 47 Gb of DNA sequencing reads from both the Illumina and PacBio platforms. These reads were used to assemble 6,768 genomic scaffolds. The draft genome was annotated using the MAKER pipeline, which identified 27,390 predicted protein-coding genes, with almost all of these predicted peptides having significant sequence similarity to Arabidopsis proteins. A comprehensive analysis of pennycress gene homologues involved in glucosinolate biosynthesis, metabolism, and transport pathways revealed high sequence conservation compared with other Brassicaceae species, and helps validate the assembly of the pennycress gene space in this draft genome. Additional comparative genomic analyses indicate that the knowledge gained from years of basic Brassicaceae research will serve as a powerful tool for identifying gene targets whose manipulation can be predicted to result in improvements for pennycress. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Do echinoderm genomes measure up?

Echinoderm genome sequences are a corpus of useful information about a clade of animals that serve as research models in fields ranging from marine ecology to cell and developmental biology. Genomic information from echinoids has contributed to insights into the gene interactions that drive the developmental process at the molecular level. Such insights often rely heavily on genomic information and the kinds of questions that can be asked thus depend on the quality of the sequence information. Here we describe the history of echinoderm genomic sequence assembly and present details about the quality of the data obtained. All of the sequence information discussed here is posted on the echinoderm information web system, Echinobase.org. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of the cyclohexylamine-degrading Pseudomonas plecoglossicida NyZ12.

Pseudomonas plecoglossicida NyZ12 (CCTCC AB 2015057), a Gram-negative bacterium isolated from soil, has the ability to degrade cyclohexylamine. The complete genome sequence of this strain (6,233,254bp of chromosome length) is presented, with information about the genes of characteristic enzymes responsible for cyclohexylamine oxidation to cyclohexanone and the integrated gene cluster for the metabolic pathway of cyclohexanone oxidation to adipate. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb.

Dendrobium officinale Kimura et Migo is a traditional Chinese orchid herb that has both ornamental value and a broad range of therapeutic effects. Here, we report the first de novo assembled 1.35 Gb genome sequences for D. officinale by combining the second-generation Illumina Hiseq 2000 and third-generation PacBio sequencing technologies. We found that orchids have a complete inflorescence gene set and have some specific inflorescence genes. We observed gene expansion in gene families related to fungus symbiosis and drought resistance. We analyzed biosynthesis pathways of medicinal components of D. officinale and found extensive duplication of SPS and SuSy genes, which are related to polysaccharide generation, and that the pathway of D. officinale alkaloid synthesis could be extended to generate 16-epivellosimine. The D. officinale genome assembly demonstrates a new approach to deciphering large complex genomes and, as an important orchid species and a traditional Chinese medicine, the D. officinale genome will facilitate future research on the evolution of orchid plants, as well as the study of medicinal components and potential genetic breeding of the dendrobe. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.

July 7, 2019

Best practices in insect genome sequencing: What works and what doesn’t.

The last decade of decreasing DNA sequencing costs and proliferating sequencing services in core labs and companies has brought the de-novo genome sequencing and assembly of insect species within reach for many entomologists. However, sequence production alone is not enough to generate a high quality reference genome, and in many cases, poor planning can lead to extremely fragmented genome assemblies preventing high quality gene annotation and other desired analyses. Insect genomes can be problematic to assemble, due to combinations of high polymorphism, inability to breed for genome homozygocity, and small physical sizes limiting the quantity of DNA able to be isolated from a single individual. Recent advances in sequencing technology and assembly strategies are enabling a revolution for insect genome reference sequencing and assembly. Here we review historical and new genome sequencing and assembly strategies, with a particular focus on their application to arthropod genomes. We highlight both the need to design sequencing strategies for the requirements of the assembly software, and new long-read technologies that are enabling a return to traditional assembly approaches. Finally, we compare and contrast very cost effective short read draft genome strategies with the long read approaches that although entailing additional cost, bring a higher likelihood of success and the possibility of archival assembly qualities approaching that of finished genomes.

July 7, 2019

Saccharina genomes provide novel insight into kelp biology.

Seaweeds are essential for marine ecosystems and have immense economic value. Here we present a comprehensive analysis of the draft genome of Saccharina japonica, one of the most economically important seaweeds. The 537-Mb assembled genomic sequence covered 98.5% of the estimated genome, and 18,733 protein-coding genes are predicted and annotated. Gene families related to cell wall synthesis, halogen concentration, development and defence systems were expanded. Functional diversification of the mannuronan C-5-epimerase and haloperoxidase gene families provides insight into the evolutionary adaptation of polysaccharide biosynthesis and iodine antioxidation. Additional sequencing of seven cultivars and nine wild individuals reveal that the genetic diversity within wild populations is greater than among cultivars. All of the cultivars are descendants of a wild S. japonica accession showing limited admixture with S. longissima. This study represents an important advance toward improving yields and economic traits in Saccharina and provides an invaluable resource for plant genome studies.

July 7, 2019

Resources for genetic and genomic analysis of emerging pathogen Acinetobacter baumannii.

Acinetobacter baumannii is a Gram-negative bacterial pathogen notorious for causing serious nosocomial infections that resist antibiotic therapy. Research to identify factors responsible for the pathogen’s success has been limited by the resources available for genome-scale experimental studies. This report describes the development of several such resources for A. baumannii strain AB5075, a recently characterized wound isolate that is multidrug resistant and displays robust virulence in animal models. We report the completion and annotation of the genome sequence, the construction of a comprehensive ordered transposon mutant library, the extension of high-coverage transposon mutant pool sequencing (Tn-seq) to the strain, and the identification of the genes essential for growth on nutrient-rich agar. These resources should facilitate large-scale genetic analysis of virulence, resistance, and other clinically relevant traits that make A. baumannii a formidable public health threat.Acinetobacter baumannii is one of six bacterial pathogens primarily responsible for antibiotic-resistant infections that have become the scourge of health care facilities worldwide. Eliminating such infections requires a deeper understanding of the factors that enable the pathogen to persist in hospital environments, establish infections, and resist antibiotics. We present a set of resources that should accelerate genome-scale genetic characterization of these traits for a reference isolate of A. baumannii that is highly virulent and representative of current outbreak strains. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

July 7, 2019

Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia.

Comparative genomics from mitochondria, plastids, and mutualistic endosymbiotic bacteria has shown that the stable establishment of a bacterium in a host cell results in genome reduction. Although many highly reduced genomes from endosymbiotic bacteria are stable in gene content and genome structure, organelle genomes are sometimes characterized by dramatic structural diversity. Previous results from Candidatus Hodgkinia cicadicola, an endosymbiont of cicadas, revealed that some lineages of this bacterium had split into two new cytologically distinct yet genetically interdependent species. It was hypothesized that the long life cycle of cicadas in part enabled this unusual lineage-splitting event. Here we test this hypothesis by investigating the structure of the Ca. Hodgkinia genome in one of the longest-lived cicadas, Magicicada tredecim. We show that the Ca. Hodgkinia genome from M. tredecim has fragmented into multiple new chromosomes or genomes, with at least some remaining partitioned into discrete cells. We also show that this lineage-splitting process has resulted in a complex of Ca. Hodgkinia genomes that are 1.1-Mb pairs in length when considered together, an almost 10-fold increase in size from the hypothetical single-genome ancestor. These results parallel some examples of genome fragmentation and expansion in organelles, although the mechanisms that give rise to these extreme genome instabilities are likely different.

July 7, 2019

GAML: genome assembly by maximum likelihood.

Resolution of repeats and scaffolding of shorter contigs are critical parts of genome assembly. Modern assemblers usually perform such steps by heuristics, often tailored to a particular technology for producing paired or long reads.We propose a new framework that allows systematic combination of diverse sequencing datasets into a single assembly. We achieve this by searching for an assembly with the maximum likelihood in a probabilistic model capturing error rate, insert lengths, and other characteristics of the sequencing technology used to produce each dataset. We have implemented a prototype genome assembler GAML that can use any combination of insert sizes with Illumina or 454 reads, as well as PacBio reads. Our experiments show that we can assemble short genomes with N50 sizes and error rates comparable to ALLPATHS-LG or Cerulean. While ALLPATHS-LG and Cerulean require each a specific combination of datasets, GAML works on any combination.We have introduced a new probabilistic approach to genome assembly and demonstrated that this approach can lead to superior results when used to combine diverse set of datasets from different sequencing technologies. Data and software is available at http://compbio.fmph.uniba.sk/gaml.

July 7, 2019

Draft genome sequence of the cellulolytic and xylanolytic thermophile Clostridium clariflavum strain 4-2a.

Clostridium clariflavum strain 4-2a, a novel strain isolated from a thermophilic biocompost pile, has demonstrated an extensive capability to utilize both cellulose and hemicellulose under thermophilic anaerobic conditions. Here, we report the draft genome of this strain. Copyright © 2015 Rooney et al.

July 7, 2019

Draft genome sequence of Frankia sp. strain DC12, an atypical, noninfective, ineffective isolate from Datisca cannabina.

Frankia sp. strain DC12, isolated from root nodules of Datisca cannabina, is a member of the fourth lineage of Frankia, which is unable to reinfect actinorhizal plants. Here, we report its 6.88-Mbp high-quality draft genome sequence, with a G+C content of 71.92% and 5,858 candidate protein-coding genes. Copyright © 2015 Tisa et al.

July 7, 2019

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites.We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e. error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at https://sourceforge.net/projects/gmcloser/.shunichi.kosugi@riken.jpSupplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites.

Of the two cultivated species of allopolyploid cotton, Gossypium barbadense produces extra-long fibers for the production of superior textiles. We sequenced its genome (AD)2 and performed a comparative analysis. We identified three bursts of retrotransposons from 20 million years ago (Mya) and a genome-wide uneven pseudogenization peak at 11-20 Mya, which likely contributed to genomic divergences. Among the 2,483 genes preferentially expressed in fiber, a cell elongation regulator, PRE1, is strikingly At biased and fiber specific, echoing the A-genome origin of spinnable fiber. The expansion of the PRE members implies a genetic factor that underlies fiber elongation. Mature cotton fiber consists of nearly pure cellulose. G. barbadense and G. hirsutum contain 29 and 30 cellulose synthase (CesA) genes, respectively; whereas most of these genes (>25) are expressed in fiber, genes for secondary cell wall biosynthesis exhibited a delayed and higher degree of up-regulation in G. barbadense compared with G. hirsutum, conferring an extended elongation stage and highly active secondary wall deposition during extra-long fiber development. The rapid diversification of sesquiterpene synthase genes in the gossypol pathway exemplifies the chemical diversity of lineage-specific secondary metabolites. The G. barbadense genome advances our understanding of allopolyploidy, which will help improve cotton fiber quality.

Auto Tag: Gap filling

The draft genome of Primula veris yields insights into the molecular basis of heterostyly.

Complete genome sequence of Bacillus pumilus strain WP8, an efficient plant growth-promoting rhizobacterium.

A draft genome of field pennycress (Thlaspi arvense) provides tools for the domestication of a new winter biofuel crop.

Do echinoderm genomes measure up?

Complete genome sequence of the cyclohexylamine-degrading Pseudomonas plecoglossicida NyZ12.

The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb.

Best practices in insect genome sequencing: What works and what doesn’t.

Saccharina genomes provide novel insight into kelp biology.

Resources for genetic and genomic analysis of emerging pathogen Acinetobacter baumannii.

Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia.

GAML: genome assembly by maximum likelihood.

Draft genome sequence of the cellulolytic and xylanolytic thermophile Clostridium clariflavum strain 4-2a.

Draft genome sequence of Frankia sp. strain DC12, an atypical, noninfective, ineffective isolate from Datisca cannabina.

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert