Menu
July 7, 2019

The challenge of analyzing the sugarcane genome.

Reference genome sequences have become key platforms for genetics and breeding of the major crop species. Sugarcane is probably the largest crop produced in the world (in weight of crop harvested) but lacks a reference genome sequence. Sugarcane has one of the most complex genomes in crop plants due to the extreme level of polyploidy. The genome of modern sugarcane hybrids includes sub-genomes from two progenitors Saccharum officinarum and S. spontaneum with some chromosomes resulting from recombination between these sub-genomes. Advancing DNA sequencing technologies and strategies for genome assembly are making the sugarcane genome more tractable. Advances in long read sequencing have allowed the generation of a more complete set of sugarcane gene transcripts. This is supporting transcript profiling in genetic research. The progenitor genomes are being sequenced. A monoploid coverage of the hybrid genome has been obtained by sequencing BAC clones that cover the gene space of the closely related sorghum genome. The complete polyploid genome is now being sequenced and assembled. The emerging genome will allow comparison of related genomes and increase understanding of the functioning of this polyploidy system. Sugarcane breeding for traditional sugar and new energy and biomaterial uses will be enhanced by the availability of these genomic resources.


July 7, 2019

Modular traits of the Rhizobiales root microbiota and their evolutionary relationship with symbiotic Rhizobia.

Rhizobia are a paraphyletic group of soil-borne bacteria that induce nodule organogenesis in legume roots and fix atmospheric nitrogen for plant growth. In non-leguminous plants, species from the Rhizobiales order define a core lineage of the plant microbiota, suggesting additional functional interactions with plant hosts. In this work, genome analyses of 1,314 Rhizobiales isolates along with amplicon studies of the root microbiota reveal the evolutionary history of nitrogen-fixing symbiosis in this bacterial order. Key symbiosis genes were acquired multiple times, and the most recent common ancestor could colonize roots of a broad host range. In addition, root growth promotion is a characteristic trait of Rhizobiales in Arabidopsis thaliana, whereas interference with plant immunity constitutes a separate, strain-specific phenotype of root commensal Alphaproteobacteria. Additional studies with a tripartite gnotobiotic plant system reveal that these traits operate in a modular fashion and thus might be relevant to microbial homeostasis in healthy roots. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.


July 7, 2019

Tracing the de novo origin of protein-coding genes in yeast.

De novo genes are very important for evolutionary innovation. However, how these genes originate and spread remains largely unknown. To better understand this, we rigorously searched for de novo genes in Saccharomyces cerevisiae S288C and examined their spread and fixation in the population. Here, we identified 84 de novo genes in S. cerevisiae S288C since the divergence with their sister groups. Transcriptome and ribosome profiling data revealed at least 8 (10%) and 28 (33%) de novo genes being expressed and translated only under specific conditions, respectively. DNA microarray data, based on 2-fold change, showed that 87% of the de novo genes are regulated during various biological processes, such as nutrient utilization and sporulation. Our comparative and evolutionary analyses further revealed that some factors, including single nucleotide polymorphism (SNP)/indel mutation, high GC content, and DNA shuffling, contribute to the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we also provide evidence suggesting the possible parallel origin of a de novo gene between S. cerevisiae and Saccharomyces paradoxus Together, our study provides several new insights into the origin and spread of de novo genes.IMPORTANCE Emergence of de novo genes has occurred in many lineages during evolution, but the birth, spread, and function of these genes remain unresolved. Here we have searched for de novo genes from Saccharomyces cerevisiae S288C using rigorous methods, which reduced the effects of bad annotation and genomic gaps on the identification of de novo genes. Through this analysis, we have found 84 new genes originating de novo from previously noncoding regions, 87% of which are very likely involved in various biological processes. We noticed that 10% and 33% of de novo genes were only expressed and translated under specific conditions, therefore, verification of de novo genes through transcriptome and ribosome profiling, especially from limited expression data, may underestimate the number of bona fide new genes. We further show that SNP/indel mutation, high GC content, and DNA shuffling could be involved in the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we provide evidence suggesting the possible parallel origin of a new gene. Copyright © 2018 Wu and Knudson.


July 7, 2019

An improved approach for reconstructing consensus repeats from short sequence reads

Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.


July 7, 2019

Meeting report: mobile genetic elements and genome plasticity 2018

The Mobile Genetic Elements and Genome Plasticity conference was hosted by Keystone Symposia in Santa Fe, NM USA, February 11–15, 2018. The organizers were Marlene Belfort, Evan Eichler, Henry Levin and Lynn Maquat. The goal of this conference was to bring together scientists from around the world to discuss the function of transposable elements and their impact on host species. Central themes of the meeting included recent innovations in genome analysis and the role of mobile DNA in disease and evolution. The conference included 200 scientists who participated in poster presentations, short talks selected from abstracts, and invited talks. A total of 58 talks were organized into eight sessions and two workshops. The topics varied from mechanisms of mobilization, to the structure of genomes and their defense strategies to protect against transposable elements.


July 7, 2019

Fast-SG: an alignment-free algorithm for hybrid assembly.

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.


July 7, 2019

Complete genome sequence of industrial biocontrol strain Paenibacillus polymyxa HY96-2 and further analysis of Its biocontrol mechanism.

Paenibacillus polymyxa (formerly known as Bacillus polymyxa) has been extensively studied for agricultural applications as a plant-growth-promoting rhizobacterium and is also an important biocontrol agent. Our team has developed the P. polymyxa strain HY96-2 from the tomato rhizosphere as the first microbial biopesticide based on P. polymyxa for controlling plant diseases around the world, leading to the commercialization of this microbial biopesticide in China. However, further research is essential for understanding its precise biocontrol mechanisms. In this paper, we report the complete genome sequence of HY96-2 and the results of a comparative genomic analysis between different P. polymyxa strains. The complete genome size of HY96-2 was found to be 5.75 Mb and 5207 coding sequences were predicted. HY96-2 was compared with seven other P. polymyxa strains for which complete genome sequences have been published, using phylogenetic tree, pan-genome, and nucleic acid co-linearity analysis. In addition, the genes and gene clusters involved in biofilm formation, antibiotic synthesis, and systemic resistance inducer production were compared between strain HY96-2 and two other strains, namely, SC2 and E681. The results revealed that all three of the P. polymyxa strains have the ability to control plant diseases via the mechanisms of colonization (biofilm formation), antagonism (antibiotic production), and induced resistance (systemic resistance inducer production). However, the variation of the corresponding genes or gene clusters between the three strains may lead to different antimicrobial spectra and biocontrol efficacies. Two possible pathways of biofilm formation in P. polymyxa were reported for the first time after searching the KEGG database. This study provides a scientific basis for the further optimization of the field applications and quality standards of industrial microbial biopesticides based on HY96-2. It may also serve as a reference for studying the differences in antimicrobial spectra and biocontrol capability between different biocontrol agents.


July 7, 2019

Genomes and transcriptomes of duckweeds.

Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.


July 7, 2019

The recombination landscape of Drosophila virilis is robust to transposon activation in hybrid dysgenesis

DNA damage in the germline is a double-edged sword. Induced double-strand breaks establish the foundation for meiotic recombination and proper chromosome segregation but can also pose a significant challenge for genome stability. Within the germline, transposable elements are powerful agents of double-strand break formation. How different types of DNA damage are resolved within the germline is poorly understood. For example, little is known about the relationship between the frequency of double-stranded breaks, both endogenous and exogenous, and the decision to repair DNA through one of the many pathways, including crossing over and gene conversion. Here we use the Drosophila virilis hybrid dysgenesis model to determine how recombination landscapes change under transposable element activation. In this system, a cross between two strains of D. virilis with divergent transposable element profiles results in the hybrid dysgenesis phenotype, which includes the germline activation of diverse transposable elements, reduced fertility, and male recombination. However, only one direction of the cross results in hybrid dysgenesis. This allows the study of recombination in genetically identical F1 females; those with baseline levels of programmed DNA damage and those with an increased level of DNA damage resulting from transposable element proliferation. Using multiplexed shotgun genotyping to map crossover events, we compared the recombination landscapes of hybrid dysgenic and non-hybrid dysgenic individuals. The frequency and distribution of meiotic recombination appears to be robust during hybrid dysgenesis. However, hybrid dysgenesis is also associated with occasional clusters of recombination derived from single dysgenic F1 mothers. The clusters of recombination are hypothesized to be the result of mitotic crossovers during early germline development. Overall, these results show that meiotic recombination in D. virilis is robust to the damage caused by transposable elements during early development.


July 7, 2019

sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing.

The genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species, that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer; last accessed September 6, 2018).


July 7, 2019

Genome size estimation of Chinese cultured artemisia annua L.

Almost all of antimalarial artemisinin is extracted from the traditional Chinese medicinal plant Artemisia annua L. However, under the condition of insufficient genomic in- formation and unresolved genetic backgrounds, regulatory mechanism of artemisinin biosynthetic pathway has not yet been clear. The genome size of genuine A. annua plants is an especially important and fundamental parameter, which helpful for further insight into genomic studies of ar- temisinin biosynthesis and improvement. In current study, all those genome sizes of A. annua samples collected with Barcoding identification were evaluated to be 1.38-1.49 Gb by Flow Cytometry (FCM) with Nipponbare as the bench- mark calibration standard and soybean and maize as two internal standards individually and simultaneously. The ge- nome estimation of seven A. annua strains came from five China provinces (Shandong, Hunan, Chongqing, Sichuan, and Hainan) with a low coefficient of variation (CV, = 2.96%) wasrelative accurate, 12.87% (220 Mb) less than previous reports about a foreign A. annuaspecies with a single con- trol. It facilitated the schedule of A. annua whole genome sequencing project, optimization of assembly methods and insight into its subsequent genetics and evolution.


July 7, 2019

PGD: Pineapple Genomics Database.

Pineapple occupies an important phylogenetic position as its reference genome is a model for studying the evolution the Bromeliaceae family and the crassulacean acid metabolism (CAM) photosynthesis. Here, we developed a pineapple genomics database (PGD, http://pineapple.angiosperms.org/pineapple/html/index.html) as a central online platform for storing and integrating genomic, transcriptomic, function annotation and genetic marker data for pineapple (Ananas comosus (L.) Merr.). The PGD currently hosts significant search tools and available datasets for researchers to study comparative genomics, gene expression, gene co-expression molecular marker, and gene annotation of A. comosus (L). PGD also performed a series of additional pages for a genomic browser that visualizes genomic data interactively, bulk data download, a detailed user manual, and data integration information. PGD was developed with the capacity to integrate future data resources, and will be used as a long-term and open access database to facilitate the study of the biology, distribution, and the evolution of pineapple and the relative plant species. An email-based helpdesk is also available to offer support with the website and requests of specific datasets from the research community.


July 7, 2019

Genome-wide analysis of the invertase gene family from maize.

The recent release of the maize genome (AGPv4) contains annotation errors of invertase genes and therefore the enzymes are bestly curated manually at the protein level in a comprehensible fashion The synthesis, transport and degradation of sucrose are determining factors for biomass allocation and yield of crop plants. Invertase (INV) is a key enzyme of carbon metabolism in both source and sink tissues. Current releases of the maize genome correctly annotates only two vacuolar invertases (ivr1 and ivr2) and four cell wall invertases (incw1, incw2 (mn1), incw3, and incw4). Our comprehensive survey identified 21 INV isogenes for which we propose a standard nomenclature grouped phylogenetically by amino acid similarity: three vacuolar (INVVR), eight cell wall (INVCW), and ten alkaline/neutral (INVAN) isogenes which form separate dendogram branches due to distinct molecular features. The acidic enzymes were curated for the presence of the DPN tripeptide which is coded by one of the smallest exons reported in plants. Particular attention was placed on the molecular role of INV in vascular tissues such as the nodes, internodes, leaf sheath, husk leaves and roots. We report the expression profile of most members of the maize INV family in nine tissues in two developmental stages, R1 and R3. INVCW7, INVVR2, INVAN8, INVAN9, INVAN10, and INVAN3 displayed the highest absolute expressions in most tissues. INVVR3, INVCW5, INVCW8, and INVAN1 showed low mRNA levels. Expressions of most INVs were repressed from stage R1 to R3, except for INVCW7 which increased significantly in all tissues after flowering. The mRNA levels of INVCW7 in the vegetative stem correlated with a higher transport rate of assimilates from leaves to the cob which led to starch accumulation and growth of the female reproductive organs.


July 7, 2019

Omics in weed science: A perspective from genomics, transcriptomics, and metabolomics approaches

Modern high-throughput molecular and analytical tools offer exciting opportunities to gain a mechanistic understanding of unique traits of weeds. During the past decade, tremendous progress has been made within the weed science discipline using genomic techniques to gain deeper insights into weedy traits such as invasiveness, hybridization, and herbicide resistance. Though the adoption of newer “omics” techniques such as proteomics, metabolomics, and physionomics has been slow, applications of these omics platforms to study plants, especially agriculturally important crops and weeds, have been increasing over the years. In weed science, these platforms are now used more frequently to understand mechanisms of herbicide resistance, weed resistance evolution, and crop–weed interactions. Use of these techniques could help weed scientists to further reduce the knowledge gaps in understanding weedy traits. Although these techniques can provide robust insights about the molecular functioning of plants, employing a single omics platform can rarely elucidate the gene-level regulation and the associated real-time expression of weedy traits due to the complex and overlapping nature of biological interactions. Therefore, it is desirable to integrate the different omics technologies to give a better understanding of molecular functioning of biological systems. This multidimensional integrated approach can therefore offer new avenues for better understanding of questions of interest to weed scientists. This review offers a retrospective and prospective examination of omics platforms employed to investigate weed physiology and novel approaches and new technologies that can provide holistic and knowledge-based weed management strategies for future.


July 7, 2019

Identification of woodland strawberry gene coexpression networks

What we think of as a strawberry is botanically not a berry or even a fruit, but rather multiple fruits (achenes that contain the seeds) on the outside of a swollen receptacle. This technicality aside, strawberries are both economically important and a useful system in which to study seed-fruit communication. While cultivated strawberries have a complex octoploid genome, one of their likely progenitors, the woodland strawberry (Fragaria vesca; Fig. 1), is a rapidly growing model system for the Rosaceae family due to its short generation time and capacity to be transformed. A draft of the woodland strawberry diploid genome sequence was released in 2011 (Shulaev et al., 2011), and the recent publication of a high-quality genome based on PacBio sequencing has added almost 1,500 genes to the annotation (Edger et al., 2018). Genetic and epigenetic resources have also been developed for this species (Xu et al., 2016; Hilmarsson et al., 2017).


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.