Menu
July 7, 2019  |  

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci.We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring.TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly.Supplementary data are available at Bioinformatics online.


July 7, 2019  |  

Interpreting whole-genome sequence analyses of foodborne bacteria for regulatory applications and outbreak investigations.

Whole-genome sequence (WGS) analysis has revolutionized the food safety industry by enabling high-resolution typing of foodborne bacteria. Higher resolving power allows investigators to identify origins of contamination during illness outbreaks and regulatory activities quickly and accurately. Government agencies and industry stakeholders worldwide are now analyzing WGS data routinely. Although researchers have published many studies that assess the efficacy of WGS data analysis for source attribution, guidance for interpreting WGS analyses is lacking. Here, we provide the framework for interpreting WGS analyses used by the Food and Drug Administration’s Center for Food Safety and Applied Nutrition (CFSAN). We based this framework on the experiences of CFSAN investigators, collaborations and interactions with government and industry partners, and evaluation of the published literature. A fundamental question for investigators is whether two or more bacteria arose from the same source of contamination. Analysts often count the numbers of nucleotide differences [single-nucleotide polymorphisms (SNPs)] between two or more genome sequences to measure genetic distances. However, using SNP thresholds alone to assess whether bacteria originated from the same source can be misleading. Bacteria that are isolated from food, environmental, or clinical samples are representatives of bacterial populations. These populations are subject to evolutionary forces that can change genome sequences. Therefore, interpreting WGS analyses of foodborne bacteria requires a more sophisticated approach. Here, we present a framework for interpreting WGS analyses that combines SNP counts with phylogenetic tree topologies and bootstrap support. We also clarify the roles of WGS, epidemiological, traceback, and other evidence in forming the conclusions of investigations. Finally, we present examples that illustrate the application of this framework to real-world situations.


July 7, 2019  |  

Complete genome sequence of industrial biocontrol strain Paenibacillus polymyxa HY96-2 and further analysis of Its biocontrol mechanism.

Paenibacillus polymyxa (formerly known as Bacillus polymyxa) has been extensively studied for agricultural applications as a plant-growth-promoting rhizobacterium and is also an important biocontrol agent. Our team has developed the P. polymyxa strain HY96-2 from the tomato rhizosphere as the first microbial biopesticide based on P. polymyxa for controlling plant diseases around the world, leading to the commercialization of this microbial biopesticide in China. However, further research is essential for understanding its precise biocontrol mechanisms. In this paper, we report the complete genome sequence of HY96-2 and the results of a comparative genomic analysis between different P. polymyxa strains. The complete genome size of HY96-2 was found to be 5.75 Mb and 5207 coding sequences were predicted. HY96-2 was compared with seven other P. polymyxa strains for which complete genome sequences have been published, using phylogenetic tree, pan-genome, and nucleic acid co-linearity analysis. In addition, the genes and gene clusters involved in biofilm formation, antibiotic synthesis, and systemic resistance inducer production were compared between strain HY96-2 and two other strains, namely, SC2 and E681. The results revealed that all three of the P. polymyxa strains have the ability to control plant diseases via the mechanisms of colonization (biofilm formation), antagonism (antibiotic production), and induced resistance (systemic resistance inducer production). However, the variation of the corresponding genes or gene clusters between the three strains may lead to different antimicrobial spectra and biocontrol efficacies. Two possible pathways of biofilm formation in P. polymyxa were reported for the first time after searching the KEGG database. This study provides a scientific basis for the further optimization of the field applications and quality standards of industrial microbial biopesticides based on HY96-2. It may also serve as a reference for studying the differences in antimicrobial spectra and biocontrol capability between different biocontrol agents.


July 7, 2019  |  

Genomes and transcriptomes of duckweeds.

Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.


July 7, 2019  |  

sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing.

The genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species, that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer; last accessed September 6, 2018).


July 7, 2019  |  

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.


July 7, 2019  |  

Complete genome sequence of the dissimilatory azo reducing thermophilic bacterium Novibacillus thermophiles SG-1.

With the isolation and identification of efficient azo-dye degradation bacteria, bioaugmentation with specific microbial strains has now become an effective strategy to promote the bioremediation of azo dye. However, Azo dye wastewater discharged at high temperature restricted the extensive application of the known mesophilic azoreducing microorganisms. Here we present the complete genome sequence of a bacterium capable of reducing azo dye under thermophilic condition, Novibacillus thermophiles SG-1 (=KCTC 33118T =CGMCC 1.12363T). The complete genome of strain SG-1 contains a circular chromosome of 3,629,225 bp with a G?+?C content of 50.44%. Genome analysis revealed that strain SG-1 possessed genes encoding riboflavin biosynthesis protein that would secrete riboflavin, which could act as electron shuttles to transport the electrons to extracellular azo dye in decolorization process. HPLC analysis showed that the concentration of riboflavin increased from 0.01?µM to 0.255?µM with the growth of strain SG-1 under azo dye reduction. Quantitative real-time PCR analysis further demonstrated that the gene encoding riboflavin biosynthesis protein would be involved in the azo dye decolorization. The results from this study would be beneficial to research the mechanism of anaerobic reduction of azo dye under thermophilic conditions. Copyright © 2018 Elsevier B.V. All rights reserved.


July 7, 2019  |  

The complete genome sequence of Bacillus halotolerans ZB201702 isolated from a drought- and salt-stressed rhizosphere soil.

Bacillus halotolerans is a rhizobacterium with the potential to promote plant growth and tolerance to drought and salinity stress. Here, we present the complete genome sequence of B. halotolerans ZB201702, which consists of 4,150,000 bp in a linear chromosome, including 3074 protein-coding sequences, 30 rRNAs, and 85 tRNAs. Genome analysis revealed many putative gene clusters involved in defense mechanisms. Activity analysis of the strain under salt and simulated drought stress suggests tolerance to abiotic stresses. The complete genome information of B. halotolerans ZB201702 could provide valuable insights into rhizobacteria-mediated plant salt and drought tolerance and rhizobacteria-based solutions for abiotic stress agriculture. Copyright © 2018 Elsevier Ltd. All rights reserved.


July 7, 2019  |  

Industrially-scalable microencapsulation of plant beneficial bacteria in dry cross-linked alginate matrix.

Microencapsulation of plant-beneficial bacteria, such as pink pigmented facultative methylotrophs (PPFM), may greatly extend the shelf life of these Gram-negative microorganisms and facilitate their application to crops for sustainable agriculture. A species of PPFM designated Methylobacterium radiotolerans was microencapsulated in cross-linked alginate microcapsules (CLAMs) prepared by an innovative and industrially scalable process that achieves polymer cross-linking during spray-drying. PPFM survived the spray-drying microencapsulation process with no significant loss in viable population, and the initial population of PPFM in CLAMs exceeded 1010 CFU/g powder. The PPFM population in CLAMs gradually declined by 4 to 5 log CFU/g over one year of storage. The extent of alginate cross-linking, modulated by adjusting the calcium phosphate content in the spray-dryer feed, did not influence cell viability after spray-drying, viability over storage, or dry particle size. However, particle size measurements and light microscopy of aqueous CLAMs suggest that enhanced crosslinking may limit the release of encapsulated bacteria. This work demonstrates an industrially scalable method for producing alginate-based inoculants that may be suitable for on-seed or foliar spray applications.


July 7, 2019  |  

Complete genome sequence of the halophile bacterium Kushneria konosiri X49T, isolated from salt-fermented Konosirus punctatus

Kushneria konosiri X49T is a member of the Halomonadaceae family within the order Oceanospirillales and can be isolated from salt-fermented larval gizzard shad. The genome of K. konosiri X49T reported here provides a genetic basis for its halophilic character. Diverse genes were involved in salt-in and -out strategies enabling adaptation of X49T to hypersaline environments. Due to resistance to high salt concentrations, genome research of K. konosiri X49T will contribute to the improvement of environmental and biotechnological usage by enhancing understanding of the osmotic equilibrium in the cytoplasm. Its genome consists of 3,584,631 bp, with an average Gthinspace+thinspaceC content of 59.1%, and 3261 coding sequences, 12 rRNAs, 66 tRNAs, and 8 miscRNAs.


July 7, 2019  |  

Complete genome sequence of Kocuria rhizophila BT304, isolated from the small intestine of castrated beef cattle.

Members of the species Kocuria rhizophila, belonging to the family Micrococcaceae in the phylum Actinobacteria, have been isolated from a wide variety of natural sources, such as soil, freshwater, fish gut, and clinical specimens. K. rhizophila is important from an industrial viewpoint, because the bacterium grows rapidly with high cell density and exhibits robustness at various growth conditions. However, the bacterium is an opportunistic pathogen involved in human infections. Here, we sequenced and analyzed the genome of the K. rhizophila strain BT304, isolated from the small intestine of adult castrated beef cattle.The genome of K. rhizophila BT304 consisted of a single circular chromosome of 2,763,150 bp with a GC content of 71.2%. The genome contained 2359 coding sequences, 51 tRNA genes, and 9 rRNA genes. Sequence annotations with the RAST server revealed many genes related to amino acid, carbohydrate, and protein metabolism. Moreover, the genome contained genes related to branched chain amino acid biosynthesis and degradation. Analysis of the OrthoANI values revealed that the genome has high similarity (>?97.8%) with other K. rhizophila strains, such as DC2201, FDAARGOS 302, and G2. Comparative genomic analysis further revealed that the antibiotic properties of K. rhizophila vary among the strains.The relatively small number of virulence-related genes and the great potential in production of host available nutrients suggest potential application of the BT304 strain as a probiotic in breeding beef cattle.


July 7, 2019  |  

PGD: Pineapple Genomics Database.

Pineapple occupies an important phylogenetic position as its reference genome is a model for studying the evolution the Bromeliaceae family and the crassulacean acid metabolism (CAM) photosynthesis. Here, we developed a pineapple genomics database (PGD, http://pineapple.angiosperms.org/pineapple/html/index.html) as a central online platform for storing and integrating genomic, transcriptomic, function annotation and genetic marker data for pineapple (Ananas comosus (L.) Merr.). The PGD currently hosts significant search tools and available datasets for researchers to study comparative genomics, gene expression, gene co-expression molecular marker, and gene annotation of A. comosus (L). PGD also performed a series of additional pages for a genomic browser that visualizes genomic data interactively, bulk data download, a detailed user manual, and data integration information. PGD was developed with the capacity to integrate future data resources, and will be used as a long-term and open access database to facilitate the study of the biology, distribution, and the evolution of pineapple and the relative plant species. An email-based helpdesk is also available to offer support with the website and requests of specific datasets from the research community.


July 7, 2019  |  

Complete genome sequence of Agrobacterium pusense VsBac-Y9, a bacterial symbiont of the dark septate endophytic fungus Veronaeopsis simplex Y34 with potential for improving fungal colonization in roots.

A Rhizobium-related bacterium (Rhizobium sp. VsBac-Y9) is a symbiont living with the dark septate endophytic (DSE) fungus Veronaeopsis simplex Y34. Co-inoculation of Rhizobium sp. VsBac-Y9 with V. simplex Y34 improves the fungal colonization of tomato roots, resulting in a significant increase in aboveground biomass. This study sequenced the complete genome of this V. simplex-helper bacterium using the PacBio and Illumina MiSeq platforms. Hybrid assembly using SPAdes outputted a circular chromosome, a linear chromid, and a circular plasmid for a total genome 5,321,211 bp in size with a G?+?C content of 59.2%. Analysis of concatenated housekeeping genes (atpD-dnaK-groEL-lepA-recA-rpoB-thrE) and calculation of average nucleotide identity, showed that VsBac-Y9 was affiliated with the species Agrobacterium pusense (syn. Rhizobium pusense). Genome analysis revealed that A. pusense VsBac-Y9 contains a series of genes responsible for the host interactions with both fungus and plant. Such genomic information will provide new insights into developing co-inoculants of endophytic fungus and its symbiotic bacterium in future agricultural innovation. Copyright © 2018 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.