Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
In this chapter, we refer to the expressed portion of the barley genome as the relatively small fraction of the total cellular DNA that either contains the genes that ultimately produce proteins, or that directly/indirectly controls the level, location and/or timing of when these genes are expressed and proteins are produced. We start by describing the dynamics of tissue and time-dependent gene expression and how common patterns across multiple samples can provide clues about gene networks involved in common biological processes. We then describe some of the complexities of how a single mRNA template can be differentially processed by alternative splicing to generate multiple different proteins or provide a mechanism to regulate the amount of functional gene product in a cell at a given point in time. We extend our analysis, using a number of biological examples, to address how diverse families of small non-coding microRNAs specifically regulate gene expression, and complete our appraisal by looking at the physical/molecular environment around genes that can result in either the promotion or repression of gene expression. We conclude by assessing some of the issues that remain around our ability to fully exploit the depth and power of current approaches for analysing gene expression and propose improvements that could be made using new but available sequencing and bioinformatics technologies.
Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits.Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis.Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.
A high-resolution genetic map of the cereal crown rot pathogen Fusarium pseudograminearum provides a near-complete genome assembly.
Fusarium pseudograminearum is an important pathogen of wheat and barley, particularly in semi-arid environments. Previous genome assemblies for this organism were based entirely on short read data and are highly fragmented. In this work, a genetic map of F. pseudograminearum has been constructed for the first time based on a mapping population of 178 individuals. The genetic map, together with long read scaffolding of a short read-based genome assembly, was used to give a near-complete assembly of the four F. pseudograminearum chromosomes. Large regions of synteny between F. pseudograminearum and F. graminearum, the related pathogen that is the primary causal agent of cereal head blight disease, were previously proposed in the core conserved genome, but the construction of a genetic map to order and orient contigs is critical to the validation of synteny and the placing of species-specific regions. Indeed, our comparative analyses of the genomes of these two related pathogens suggest that rearrangements in the F. pseudograminearum genome have occurred in the chromosome ends. One of these rearrangements includes the transposition of an entire gene cluster involved in the detoxification of the benzoxazolinone (BOA) class of plant phytoalexins. This work provides an important genomic and genetic resource for F. pseudograminearum, which is less well characterized than F. graminearum. In addition, this study provides new insights into a better understanding of the sexual reproduction process in F. pseudograminearum, which informs us of the potential of this pathogen to evolve.© 2016 BSPP AND JOHN WILEY & SONS LTD.
The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution.
The sea lamprey (Petromyzon marinus) serves as a comparative model for reconstructing vertebrate evolution. To enable more informed analyses, we developed a new assembly of the lamprey germline genome that integrates several complementary data sets. Analysis of this highly contiguous (chromosome-scale) assembly shows that both chromosomal and whole-genome duplications have played significant roles in the evolution of ancestral vertebrate and lamprey genomes, including chromosomes that carry the six lamprey HOX clusters. The assembly also contains several hundred genes that are reproducibly eliminated from somatic cells during early development in lamprey. Comparative analyses show that gnathostome (mouse) homologs of these genes are frequently marked by polycomb repressive complexes (PRCs) in embryonic stem cells, suggesting overlaps in the regulatory logic of somatic DNA elimination and bivalent states that are regulated by early embryonic PRCs. This new assembly will enhance diverse studies that are informed by lampreys’ unique biology and evolutionary/comparative perspective.
A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.
Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) ‘Hongyang’ draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models.A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within ‘Hongyang’ The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned ‘Hort16A’ cDNAs and comparing with the predicted protein models for Red5 and both the original ‘Hongyang’ assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised ‘Hongyang’ annotation, respectively, compared with 90.9% to the Red5 models.Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.
Land plants evolved from charophytic algae, among which Charophyceae possess the most complex body plans. We present the genome of Chara braunii; comparison of the genome to those of land plants identified evolutionary novelties for plant terrestrialization and land plant heritage genes. C. braunii employs unique xylan synthases for cell wall biosynthesis, a phragmoplast (cell separation) mechanism similar to that of land plants, and many phytohormones. C. braunii plastids are controlled via land-plant-like retrograde signaling, and transcriptional regulation is more elaborate than in other algae. The morphological complexity of this organism may result from expanded gene families, with three cases of particular note: genes effecting tolerance to reactive oxygen species (ROS), LysM receptor-like kinases, and transcription factors (TFs). Transcriptomic analysis of sexual reproductive structures reveals intricate control by TFs, activity of the ROS gene network, and the ancestral use of plant-like storage and stress protection proteins in the zygote. Copyright © 2018 Elsevier Inc. All rights reserved.
Oropetium thomaeum is an emerging model for desiccation tolerance and genome size evolution in grasses. A high-quality draft genome of Oropetium was recently sequenced, but the lack of a chromosome scale assembly has hindered comparative analyses and downstream functional genomics. Here, we reassembled Oropetium, and anchored the genome into ten chromosomes using Hi-C based chromatin interactions. A combination of high-resolution RNAseq data and homology-based gene prediction identified thousands of new, conserved gene models that were absent from the V1 assembly. This includes thousands of new genes with high expression across a desiccation timecourse. The sorghum and Oropetium genomes have a surprising degree of chromosome-level collinearity, and several chromosome pairs have near perfect synteny. Other chromosomes are collinear in the gene rich chromosome arms but have experienced pericentric translocations. Together, these resources will be useful for the grass comparative genomic community and further establish Oropetium as a model resurrection plant.
Understanding how crop plants evolved from their wild relatives and spread around the world can inform about the origins of agriculture. Here, we review how the rapid development of genomic resources and tools has made it possible to conduct genetic mapping and population genetic studies to unravel the molecular underpinnings of domestication and crop evolution in diverse crop species. We propose three future avenues for the study of crop evolution: establishment of high-quality reference genomes for crops and their wild relatives; genomic characterization of germplasm collections; and the adoption of novel methodologies such as archaeogenetics, epigenomics, and genome editing.
Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications.
Modern genotyping techniques, such as SNP analysis and genotyping by sequencing (GBS), are hampered by poor DNA quality and purity, particularly in challenging plant species, rich in secondary metabolites. We therefore investigated the utility of a pre-wash step using a buffered sorbitol solution, prior to DNA extraction using a high salt CTAB extraction protocol, in a high throughput or miniprep setting. This pre-wash appears to remove interfering metabolites, such as polyphenols and polysaccharides, from tissue macerates. We also investigated the adaptability of the sorbitol pre-wash for RNA extraction using a lithium chloride-based protocol. The method was successfully applied to a variety of tissues, including leaf, cambium and fruit of diverse plant species including annual crops, forest and fruit trees, herbarium leaf material and lyophilized fungal mycelium. We consistently obtained good yields of high purity DNA or RNA in all species tested. The protocol has been validated for thousands of DNA samples by generating high data quality in dense SNP arrays. DNA extracted from Eucalyptus spp. leaf and cambium as well as mycelium from Trichoderma spp. was readily digested with restriction enzymes and performed consistently in AFLP assays. Scaled-up DNA extractions were also suitable for long read sequencing. Successful RNA quality control and good RNA-Seq data for Eucalyptus and cashew confirms the effectiveness of the sorbitol buffer pre-wash for high quality RNA extraction.
A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content
Cannabis has been cultivated for millennia with distinct cultivars providing either fiber and grain or tetrahydrocannabinol. Recent demand for cannabidiol rather than tetrahydrocannabinol has favored the breeding of admixed cultivars with extremely high cannabidiol content. Despite several draft Cannabis genomes, the genomic structure of cannabinoid synthase loci has remained elusive. A genetic map derived from a tetrahydrocannabinol/cannabidiol segregating population and a complete chromosome assembly from a high-cannabidiol cultivar together resolve the linkage of cannabidiolic and tetrahydrocannabinolic acid synthase gene clusters which are associated with transposable elements. High-cannabidiol cultivars appear to have been generated by integrating hemp-type cannabidiolic acid synthase gene clusters into a background of marijuana-type cannabis. Quantitative trait locus mapping suggests that overall drug potency, however, is associated with other genomic regions needing additional study.
The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans.
Sweet osmanthus (Osmanthus fragrans) is a very popular ornamental tree species throughout Southeast Asia and USA particularly for its extremely fragrant aroma. We constructed a chromosome-level reference genome of O. fragrans to assist in studies of the evolution, genetic diversity, and molecular mechanism of aroma development. A total of over 118?Gb of polished reads was produced from HiSeq (45.1?Gb) and PacBio Sequel (73.35?Gb), giving 100× depth coverage for long reads. The combination of Illumina-short reads, PacBio-long reads, and Hi-C data produced the final chromosome quality genome of O. fragrans with a genome size of 727?Mb and a heterozygosity of 1.45 %. The genome was annotated using de novo and homology comparison and further refined with transcriptome data. The genome of O. fragrans was predicted to have?45,542 genes, of which 95.68 % were functionally annotated. Genome annotation found 49.35 % as the repetitive sequences, with long terminal repeats (LTR) being the richest (28.94 %). Genome evolution analysis indicated the evidence of whole-genome duplication 15 million years ago, which contributed to the current content of 45,242 genes. Metabolic analysis revealed that linalool, a monoterpene is the main aroma compound. Based on the genome and transcriptome, we further demonstrated the direct connection between terpene synthases (TPSs) and the rich aromatic molecules in O. fragrans. We identified three new flower-specific TPS genes, of which the expression coincided with the production of linalool. Our results suggest that the high number of TPS genes and the flower tissue- and stage-specific TPS genes expressions might drive the strong unique aroma production of O. fragrans.
Although several resurrection plant genomes have been sequenced, the lack of suitable dehydration-sensitive outgroups has limited genomic insights into the origin of desiccation tolerance. Here, we utilized a comparative system of closely related desiccation-tolerant (Lindernia brevidens) and -sensitive (Lindernia subracemosa) species to identify gene- and pathway-level changes associated with the evolution of desiccation tolerance. The two high-quality Lindernia genomes we assembled are largely collinear, and over 90% of genes are conserved. L. brevidens and L. subracemosa have evidence of an ancient, shared whole-genome duplication event, and retained genes have neofunctionalized, with desiccation-specific expression in L. brevidens Tandem gene duplicates also are enriched in desiccation-associated functions, including a dramatic expansion of early light-induced proteins from 4 to 26 copies in L. brevidens A comparative differential gene coexpression analysis between L. brevidens and L. subracemosa supports extensive network rewiring across early dehydration, desiccation, and rehydration time courses. Many LATE EMBRYOGENESIS ABUNDANT genes show significantly higher expression in L. brevidens compared with their orthologs in L. subracemosa Coexpression modules uniquely upregulated during desiccation in L. brevidens are enriched with seed-specific and abscisic acid-associated cis-regulatory elements. These modules contain a wide array of seed-associated genes that have no expression in the desiccation-sensitive L. subracemosa Together, these findings suggest that desiccation tolerance evolved through a combination of gene duplications and network-level rewiring of existing seed desiccation pathways.© 2018 American Society of Plant Biologists. All rights reserved.
B chromosomes (Bs) were discovered a century ago, and since then, most studies have focused on describing their distribution and abundance using traditional cytogenetics. Only recently have attempts been made to understand their structure and evolution at the level of DNA sequence. Many questions regarding the origin, structure, function, and evolution of B chromosomes remain unanswered. Here, we identify B chromosome sequences from several species of cichlid fish from Lake Malawi by examining the ratios of DNA sequence coverage in individuals with or without B chromosomes. We examined the efficiency of this method, and compared results using both Illumina and PacBio sequence data. The B chromosome sequences detected in 13 individuals from 7 species were compared to assess the rates of sequence replacement. B-specific sequence common to at least 12 of the 13 datasets were identified as the “Core” B chromosome. The location of B sequence homologs throughout the genome provides further support for theories of B chromosome evolution. Finally, we identified genes and gene fragments located on the B chromosome, some of which may regulate the segregation and maintenance of the B chromosome.