Menu
April 21, 2020

A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences.We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover’s distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover’s distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours.The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.


April 21, 2020

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5?kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15?megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


April 21, 2020

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation. © The Author 2017. Published by Oxford University Press.


April 21, 2020

Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement.

Maize is one of the most important crops globally, and it shows remarkable genetic diversity. Knowledge of this diversity could help in crop improvement; however, gold-standard genomes have been elucidated only for modern temperate varieties. Here, we present a high-quality reference genome (contig N50 of 15.78?megabases) of the maize small-kernel inbred line, which is derived from a tropical landrace. Using haplotype maps derived from B73, Mo17 and SK, we identified 80,614 polymorphic structural variants across 521 diverse lines. Approximately 22% of these variants could not be detected by traditional single-nucleotide-polymorphism-based approaches, and some of them could affect gene expression and trait performance. To illustrate the utility of the diverse SK line, we used it to perform map-based cloning of a major effect quantitative trait locus controlling kernel weight-a key trait selected during maize improvement. The underlying candidate gene ZmBARELY ANY MERISTEM1d provides a target for increasing crop yields.


April 21, 2020

Insights into the evolution and drug susceptibility of Babesia duncani from the sequence of its mitochondrial and apicoplast genomes.

Babesia microti and Babesia duncani are the main causative agents of human babesiosis in the United States. While significant knowledge about B. microti has been gained over the past few years, nothing is known about B. duncani biology, pathogenesis, mode of transmission or sensitivity to currently recommended therapies. Studies in immunocompetent wild type mice and hamsters have shown that unlike B. microti, infection with B. duncani results in severe pathology and ultimately death. The parasite factors involved in B. duncani virulence remain unknown. Here we report the first known completed sequence and annotation of the apicoplast and mitochondrial genomes of B. duncani. We found that the apicoplast genome of this parasite consists of a 34?kb monocistronic circular molecule encoding functions that are important for apicoplast gene transcription as well as translation and maturation of the organelle’s proteins. The mitochondrial genome of B. duncani consists of a 5.9?kb monocistronic linear molecule with two inverted repeats of 48?bp at both ends. Using the conserved cytochrome b (Cytb) and cytochrome c oxidase subunit I (coxI) proteins encoded by the mitochondrial genome, phylogenetic analysis revealed that B. duncani defines a new lineage among apicomplexan parasites distinct from B. microti, Babesia bovis, Theileria spp. and Plasmodium spp. Annotation of the apicoplast and mitochondrial genomes of B. duncani identified targets for development of effective therapies. Our studies set the stage for evaluation of the efficacy of these drugs alone or in combination against B. duncani in culture as well as in animal models.Copyright © 2018 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.


April 21, 2020

Antibiotic resistance and heavy metal tolerance plasmids: the antimicrobial bulletproof properties of Escherichia fergusonii isolated from poultry.

We describe the mobilome of Escherichia fergusonii 40A isolated from poultry, consisting of four different plasmids, p46_40A (IncX1, 45,869 bp), p80_40A (non-typable, 79,635 bp), p150_40A (IncI1-ST1, 148,340 bp) and p280_40A (IncHI2A-ST2, 279,537 bp). The mobilome-40A carries a blend of several different resistance and virulence genes, heavy metal tolerance operons and conjugation system. This mobilome 40A is a perfect tool to preserve and disseminate antimicrobial resistance and makes the bacterial isolate incredibly adapted to survive under constant antimicrobial pressure.


April 21, 2020

SMRT long reads and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica).

The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioral, and genetic studies. To facilitate further population genetics and genomic studies, we present a reference genome assembly for the European subspecies (H. r. rustica).As part of the Genome10K effort on generating high-quality vertebrate genomes (Vertebrate Genomes Project), we have assembled a highly contiguous genome assembly using single molecule real-time (SMRT) DNA sequencing and several Bionano optical map technologies. We compared and integrated optical maps derived from both the Nick, Label, Repair, and Stain technology and from the Direct Label and Stain (DLS) technology. As proposed by Bionano, DLS more than doubled the scaffold N50 with respect to the nickase. The dual enzyme hybrid scaffold led to a further marginal increase in scaffold N50 and an overall increase of confidence in the scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a scaffold N50 value of more than 25.95 Mbp.This high-quality genome assembly represents a valuable resource for future studies of population genetics and genomics in the barn swallow and for studies concerning the evolution of avian genomes. It also represents one of the very first genomes assembled by combining SMRT long-read sequencing with the new Bionano DLS technology for scaffolding. The quality of this assembly demonstrates the potential of this methodology to substantially increase the contiguity of genome assemblies.


April 21, 2020

Competition between mobile genetic elements drives optimization of a phage-encoded CRISPR-Cas system: insights from a natural arms race.

CRISPR-Cas systems function as adaptive immune systems by acquiring nucleotide sequences called spacers that mediate sequence-specific defence against competitors. Uniquely, the phage ICP1 encodes a Type I-F CRISPR-Cas system that is deployed to target and overcome PLE, a mobile genetic element with anti-phage activity in Vibrio cholerae. Here, we exploit the arms race between ICP1 and PLE to examine spacer acquisition and interference under laboratory conditions to reconcile findings from wild populations. Natural ICP1 isolates encode multiple spacers directed against PLE, but we find that single spacers do not interfere equally with PLE mobilization. High-throughput sequencing to assay spacer acquisition reveals that ICP1 can also acquire spacers that target the V. cholerae chromosome. We find that targeting the V. cholerae chromosome proximal to PLE is sufficient to block PLE and is dependent on Cas2-3 helicase activity. We propose a model in which indirect chromosomal spacers are able to circumvent PLE by Cas2-3-mediated processive degradation of the V. cholerae chromosome before PLE mobilization. Generally, laboratory-acquired spacers are much more diverse than the subset of spacers maintained by ICP1 in nature, showing how evolutionary pressures can constrain CRISPR-Cas targeting in ways that are often not appreciated through in vitro analyses. This article is part of a discussion meeting issue ‘The ecology and evolution of prokaryotic CRISPR-Cas adaptive immune systems’.


April 21, 2020

Comparative Genome Characterization of a Petroleum-Degrading Bacillus subtilis Strain DM2.

The complete genome sequence of Bacillus subtilis strain DM2 isolated from petroleum-contaminated soil on the Tibetan Plateau was determined. The genome of strain DM2 consists of a circular chromosome of 4,238,631 bp for 4458 protein-coding genes and a plasmid of 84,240 bp coding for 103 genes. Thirty-four genomic islands coding for 330 proteins and 5 prophages are found in the genome. The DDH value shows that strain DM2 belongs to B. subtilis subsp. subtilis subspecies, but significant variations of the genome are also present. Comparative analysis showed that the genome of strain DM2 encodes some strain-specific proteins in comparison with B. subtilis subsp. subtilis str. 168, such as carboxymuconolactone decarboxylase family protein, gfo/Idh/MocA family oxidoreductases, GlsB/YeaQ/YmgE family stress response membrane protein, HlyC/CorC family transporters, LLM class flavin-dependent oxidoreductase, and LPXTG cell wall anchor domain-containing protein. Most of the common strain-specific proteins in DM2 and MJ01 strains, or proteins unique to DM2 strain, are involved in the pathways related to stress response, signaling, and hydrocarbon degradation. Furthermore, the strain DM2 genome contains 122 genes coding for developed two-component systems and 138 genes coding for ABC transporter systems. The prominent features of the strain DM2 genome reflect the evolutionary fitness of this strain to harsh conditions and hydrocarbon utilization.


April 21, 2020

Characterization of a catalase from red-lip mullet (Liza haematocheila): Demonstration of antioxidative activity and mRNA upregulation in response to immunostimulants.

Reactive oxygen species, generated in all the aerobic organisms, can cause oxidative stress. Excessive ROS may become a source of carcinogen due to DNA damage, lipid peroxidation, cell injury, and cell death. In order to prevent these adverse effects of ROS, antioxidant enzymes have evolved in aerobic organisms. Catalase is a major antioxidant enzyme that breaks down excessive H2O2 and inhibits apoptotic cell death. Here we molecularly characterized catalase from red-lip mullet. The cDNA sequence of LhCAT consists of an ORF of 1545?bp, which encodes a 527 amino acid peptide (~60?kDa). Based on bioinformatics analysis, LhCAT possesses a domain architecture characteristic of catalases, including a catalase proximal active site signature and a catalase proximal heme-ligand signature. It also has heme and NADPH binding sites homologous to previously described catalases. Pairwise alignment with its homologs revealed that LhCAT shares 95.1% identity with Oplegnathus fasciatus catalase and 97.4% similarity with Sparus aurata catalase. An uprooted phylogenetic tree demonstrated that LhCAT resides in a clade with catalases from other teleosts and exhibits a close relationship with Oplegnathus fasciatus catalase. Among twelve tissue types, we observed the highest LhCAT mRNA expression in the liver, followed by blood. Immune challenge by Lactococcus garvieae, or Poly I:C in the blood or spleen resulted in up-regulation at 24?h post injection. We also tested the antioxidant activity of recombinant LhCAT against hydrogen peroxide and found its optimal concentration to be 12.5?µg/mL. Collectively, these data suggested that LhCAT play an important role in antioxidant defense and immune response of red-lip mullet.Copyright © 2019 Elsevier B.V. All rights reserved.


April 21, 2020

Identification of plasmid encoded osmoregulatory genes from halophilic bacteria isolated from the rhizosphere of halophytes.

Bacterial plasmids carry genes that code for additional traits such as osmoregulation, CO2 fixation, antibiotic and heavy metal resistance, root nodulation and nitrogen fixation. The main objective of the current study was to identify plasmid-conferring osmoregulatory genes in bacteria isolated from rhizospheric and non-rhizospheric soils of halophytes (Salsola stocksii and Atriplex amnicola). More than 55% of halophilic bacteria from the rhizosphere and 70% from non-rhizospheric soils were able to grow at 3?M salt concentrations. All the strains showed optimum growth at 1.5-3.0?M NaCl. Bacterial strains from the Salsola rhizosphere showed maximum (31%) plasmid elimination during curing experiments as compared to bacterial strains from the Atriplex rhizosphere and non-rhizospheric soils. Two plasmid cured strains Bacillus HL2HP6 and Oceanobacillus HL2RP7 lost their ability to grow in halophilic medium, but they grew well on LB medium. The plasmid cured strains also showed a change in sensitivity to specific antibiotics. These plasmids were isolated and transformed into E. coli strains and growth response of wild-type and transformed E. coli strains was compared at 1.5-4?M NaCl concentrations. Chromosomal DNA and plasmids from Bacillus filamentosus HL2HP6 were sequenced by using high throughput sequencing approach. Results of functional analysis of plasmid sequences showed different proteins and enzymes involved in osmoregulation of bacteria, such as trehalose, ectoine synthetase, porins, proline, alanine, inorganic ion transporters, dehydrogenases and peptidases. Our results suggested that plasmid conferring osmoregulatory genes play a vital role to maintain internal osmotic balance of bacterial cells and these genes can be used to develop salt tolerant transgenic crops.Copyright © 2019 Elsevier GmbH. All rights reserved.


April 21, 2020

Bradyrhizobium nanningense sp. nov., Bradyrhizobium guangzhouense sp. nov. and Bradyrhizobium zhanjiangense sp. nov., isolated from effective nodules of peanut in Southeast China.

Nine slow-growing rhizobia isolated from effective nodules on peanut (Arachis hypogaea) were characterized to clarify the taxonomic status using a polyphasic approach. They were assigned to the genus Bradyrhizobium on the basis of 16S rRNA sequences. MLSA of concatenated glnII-recA-dnaK genes classified them into three species represented by CCBAU 53390T, CCBAU 51670T and CCBAU 51778T, which presented the closest similarity to B. guangxiense CCBAU 53363T, B. guangdongense CCBAU 51649T and B. manausense BR 3351T, B. vignae 7-2T and B. forestalis INPA 54BT, respectively. The dDDH (digital DNA-DNA hybridization) and ANI (Average Nucleotide Identity) between the genomes of the three representative strains and type strains for the closest Bradyrhizobium species were less than 42.1% and 91.98%, respectively, below the threshold of species circumscription. Effective nodules could be induced on peanut and Lablab purpureus by all representative strains, while Vigna radiata formed effective nodules only with CCBAU 53390T and CCBAU 51778T. Phenotypic characteristics including sole carbon sources and growth features supported the phylogenetic results. Based on the genotypic and phenotypic features, strains CCBAU 53390T, CCBAU 51670T and CCBAU 51778T are designated the type strains of three novel species, for which the names Bradyrhizobium nanningense sp. nov., Bradyrhizobium guangzhouense sp. nov. and Bradyrhizobium zhanjiangense sp. nov. are proposed, respectively.Copyright © 2019 Elsevier GmbH. All rights reserved.


April 21, 2020

A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis

Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.


April 21, 2020

The complete genome sequence of Ethanoligenens harbinense reveals the metabolic pathway of acetate-ethanol fermentation: A novel understanding of the principles of anaerobic biotechnology.

Ethanol-type fermentation is one of three main fermentation types in the acidogenesis of anaerobic treatment systems. Non-spore-forming Ethanoligenens is as a typical genus capable of ethanol-type fermentation in mixed culture (i.e. acetate-ethanol fermentation). This genus can produce ethanol, acetate, CO2, and H2 using carbohydrates, and has application potential in anaerobic bioprocesses. Here, the complete genome sequences and methylome of Ethanoligenens harbinense strains with different autoaggregative and coaggregative abilities were obtained using the PacBio single-molecule real-time sequencing platform. The genome size of E. harbinense strains was about 2.97-3.10?Mb with 55.5% G+C content. 3020-3153 genes were annotated, most of which were methylated at specific sites or motifs. The methylation types included 6mA, 4mC, and unknown types. Comparative genomic analysis demonstrated low levels of genetic similarity between E. harbinense and other well-known hydrogen-producing bacteria (i.e., Clostridium and Thermoanaerobacter) in phylogenesis. Hydrogen production of E. harbinense was catalyzed by genes that encode [FeFe]-hydrogenases and that were synthesized by three maturases of [FeFe]-H2ase. The metabolic mechanism of H2-ethanol co-production fermentation, catalyzed by pyruvate ferredoxin oxidoreductase was proposed. This study provides genetic and evolutionary information of a model genus for the further investigation of the metabolic pathway and regulatory network of ethanol-type fermentation and anaerobic bioprocesses for waste or wastewater treatment.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020

Complete genome sequence data of Flavobacterium anhuiense strain GSE09, a volatile-producing biocontrol bacterium isolated from cucumber (Cucumis sativus) root.

Flavobacterium anhuiense (previously identified as Flavobacterium johnsoniae) strain GSE09 is a volatile-producing bacterium that exhibits significant biocontrol activity against an oomycete pathogen, Phytophthora capsici, on pepper plants. Here, we report the complete genome sequence data of strain GSE09, isolated from surface-sterilized cucumber root. The genome consists of a circular 5,109,718-bp chromosome with a G + C content of 34.30%. A total of 4,138 complete coding sequences including 15 rRNA, 66 tRNA, 3 ncRNA, and 51 pseudogene sequences were retrieved. Thus, the genome sequence data of F. anhuiense GSE09 may facilitate the elucidation of many biological traits related to the biocontrol against plant pathogens.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.