Gene prediction Archives

June 1, 2021

Genome assembly strategies of the recent polyploid, Coffea arabica.

Arabica coffee, revered for its taste and aroma, has a complex genome. It is an allotetraploid (2n=4x=44) with a genome size of approximately 1.3 Gb, derived from the recent (< 0.6 Mya) hybridization of two diploid progenitors (2n=2x=22), C. canephora (710 Mb) and C. eugenioides (670 Mb). Both parental species diverged recently (< 4.2Mya) and their genomes are highly homologous. To facilitate assembly, a dihaploid plant was chosen for sequencing. Initial genome assembly attempts with short read data produced an assembly covering 1,031 Mb of the C. arabica genome with a contig L50 of 9kb. By implementation of long read PacBio at greater than 50x coverage and cutting-edge PacBio software, a de novo PacBio-only genome assembly was constructed that covers 1,042 Mb of the genome with an L50 of 267 kb. The two assemblies were assessed and compared to determine gene content, chimeric regions, and the ability to separate the parental genomes. A genetic map that contains 600 SSRs is being used for anchoring the contigs and improve the sub-genome differentiation together with the search of sub-genome specific SNPs. PacBio transcriptome sequencing is currently being added to finalize gene annotation of the polished assembly. The finished genome assembly will be used to guide re-sequencing assemblies of parental genomes (C. canephora and C. eugenioides) as well as a template for GBS analysis and whole genome re-sequencing of a set of C. arabica accessions representative of the species diversity. The obtained data will provide powerful genomic tools to enable more efficient coffee breeding strategies for this crop, which is highly susceptible to climate change and is the main source of income for millions of small farmers in producing countries.

September 22, 2019

The transcriptome of human pluripotent stem cells.

Human Embryonic Stem Cells (hESCs) are in vitro derivatives of the inner cell mass of the blastocyst and are characterized by an undifferentiated and pluripotent state that can be perpetuated in time, indefinitely. hESCs provide a unique opportunity to both dissect the molecular mechanisms that are predisposed to the maintenance of pluripotency and model the ability to initiate differentiation and cell commitment within the developing embryo. To fully understand these mechanisms, it is necessary to accurately identify the specific transcriptome of hESCs. Many distinct gene annotation methods, such as cDNA and EST sequencing and RNA-Seq, have been used to identify the transcriptome of hESCs. Lately, we developed a new tool (IDP) to integrate the hybrid sequencing data to characterize a more reliable and comprehensive hESC transcriptome with discoveries of many novel transcripts. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

September 22, 2019

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction.

We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.

September 22, 2019

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing.We present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets. PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets (sequenced by PacBio and 454 platforms) with relatively high indel sequencing errors. In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index (CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator (CPC), in a single-threading running manner.PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.

July 19, 2019

Shared signatures of parasitism and phylogenomics unite Cryptomycota and microsporidia.

Fungi grow within their food, externally digesting it and absorbing nutrients across a semirigid chitinous cell wall. Members of the new phylum Cryptomycota were proposed to represent intermediate fungal forms, lacking a chitinous cell wall during feeding and known almost exclusively from ubiquitous environmental ribosomal RNA sequences that cluster at the base of the fungal tree [1, 2]. Here, we sequence the first Cryptomycotan genome (the water mold endoparasite Rozella allomycis) and unite the Cryptomycota with another group of endoparasites, the microsporidia, based on phylogenomics and shared genomic traits. We propose that Cryptomycota and microsporidia share a common endoparasitic ancestor, with the clade unified by a chitinous cell wall used to develop turgor pressure in the infection process [3, 4]. Shared genomic elements include a nucleotide transporter that is used by microsporidia for stealing energy in the form of ATP from their hosts [5]. Rozella harbors a mitochondrion that contains a very rapidly evolving genome and lacks complex I of the respiratory chain. These degenerate features are offset by the presence of nuclear genes for alternative respiratory pathways. The Rozella proteome has not undergone major contraction like microsporidia; instead, several classes have undergone expansion, such as host-effector, signal-transduction, and folding proteins. Copyright © 2013 Elsevier Ltd. All rights reserved.

July 19, 2019

The architecture of a scrambled genome reveals massive levels of genomic rearrangement during development.

Programmed DNA rearrangements in the single-celled eukaryote Oxytricha trifallax completely rewire its germline into a somatic nucleus during development. This elaborate, RNA-mediated pathway eliminates noncoding DNA sequences that interrupt gene loci and reorganizes the remaining fragments by inversions and permutations to produce functional genes. Here, we report the Oxytricha germline genome and compare it to the somatic genome to present a global view of its massive scale of genome rearrangements. The remarkably encrypted genome architecture contains >3,500 scrambled genes, as well as >800 predicted germline-limited genes expressed, and some posttranslationally modified, during genome rearrangements. Gene segments for different somatic loci often interweave with each other. Single gene segments can contribute to multiple, distinct somatic loci. Terminal precursor segments from neighboring somatic loci map extremely close to each other, often overlapping. This genome assembly provides a draft of a scrambled genome and a powerful model for studies of genome rearrangement. Copyright © 2014 Elsevier Inc. All rights reserved.

July 7, 2019

De novo assembly of the quorum-sensing Pandoraea sp. strain RB-44 complete genome sequence using PacBio Single-Molecule Real-Time Sequencing Technology.

We report the first complete genome sequence of Pandoraea sp. strain RB-44, which was found to possess quorum-sensing properties. To the best of our knowledge, this is the first documentation of both a complete genome sequence and quorum-sensing properties of a Pandoraea species.

July 7, 2019

Complete genome sequence of Pandoraea pnomenusa 3kgm, a quorum-sensing strain isolated from a former landfill site.

Pandoraea pnomenusa strain 3kgm has been identified as a quorum-sensing strain isolated from soil. Here, we report the complete genome sequence of P. pnomenusa strain 3kgm by using the Pacific Biosciences single-molecule real-time (PacBio RS SMRT) sequencer high-resolution technology.

July 7, 2019

Genome sequence and methylome of soil bacterium Gemmatirosa kalamazoonensis KBS708(T), a member of the rarely cultivated Gemmatimonadetes phylum.

Bacteria belonging to the phylum Gemmatimonadetes are found in a wide variety of environments and are particularly abundant in soils. Here, we present the complete genome sequence and methylation pattern of the newly described Gemmatirosa kalamazoonensis type strain.

July 7, 2019

Draft genome sequence of the pathogenic fungus Scedosporium apiospermum.

The first genome of one species of the Scedosporium apiospermum complex, responsible for localized to severe disseminated infections according to the immune status of the host, will contribute to a better understanding of the pathogenicity of these fungi and also to the discovery of the mechanisms underlying their low susceptibility to current antifungals. Copyright © 2014 Vandeputte et al.

July 7, 2019

Draft genome sequences of 10 strains of the genus exiguobacterium.

High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes. Copyright © 2014 Vishnivetskaya et al.

July 7, 2019

Get your high-quality low-cost genome sequence.

The study of whole-genome sequences has become essential for almost all branches of biological research. Next-generation sequencing (NGS) has revolutionized the scalability, speed, and resolution of sequencing and brought genomic science within reach of academic laboratories that study non-model organisms. Here, we show that a high-quality draft genome of a eukaryote can be obtained at relatively low cost by exploiting a hybrid combination of sequencing strategies. Copyright © 2014 Elsevier Ltd. All rights reserved.

July 7, 2019

Genome sequences of Vibrio navarrensis, a potential human pathogen.

Vibrio navarrensis is an aquatic bacterium recently shown to be associated with human illness. We report the first genome sequences of three V. navarrensis strains obtained from clinical and environmental sources. Preliminary analyses of the sequences reveal that V. navarrensis contains genes commonly associated with virulence in other human pathogens. Copyright © 2014 Gladney et al.

July 7, 2019

Whole-genome sequence of Borrelia garinii strain 935T isolated from Ixodes persulcatus in South Korea.

We report here the genome sequence of Borrelia garinii strain 935T isolated from Ixodes persulcatus in South Korea. The 1,176,739 bp (G+C content, 27.73%) genome consists of 1,194 coding regions, 4 rRNA genes, and 33 aminoacyl-tRNA synthetase genes. This is the first whole-genome report of a Korean Borrelia species isolate. Copyright © 2014 Noh et al.

July 7, 2019

Complete genome of Serratia sp. strain FGI 94, a strain associated with leaf-cutter ant fungus gardens.

Serratia sp. strain FGI 94 was isolated from a fungus garden of the leaf-cutter ant Atta colombica. Analysis of its 4.86-Mbp chromosome will help advance our knowledge of symbiotic interactions and plant biomass degradation in this ancient ant-fungus mutualism.

Asset Tag: Gene prediction

Genome assembly strategies of the recent polyploid, Coffea arabica.

The transcriptome of human pluripotent stem cells.

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

Shared signatures of parasitism and phylogenomics unite Cryptomycota and microsporidia.

The architecture of a scrambled genome reveals massive levels of genomic rearrangement during development.

De novo assembly of the quorum-sensing Pandoraea sp. strain RB-44 complete genome sequence using PacBio Single-Molecule Real-Time Sequencing Technology.

Complete genome sequence of Pandoraea pnomenusa 3kgm, a quorum-sensing strain isolated from a former landfill site.

Genome sequence and methylome of soil bacterium Gemmatirosa kalamazoonensis KBS708(T), a member of the rarely cultivated Gemmatimonadetes phylum.

Draft genome sequence of the pathogenic fungus Scedosporium apiospermum.

Draft genome sequences of 10 strains of the genus exiguobacterium.

Get your high-quality low-cost genome sequence.

Genome sequences of Vibrio navarrensis, a potential human pathogen.

Whole-genome sequence of Borrelia garinii strain 935T isolated from Ixodes persulcatus in South Korea.

Complete genome of Serratia sp. strain FGI 94, a strain associated with leaf-cutter ant fungus gardens.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert