Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically homomorphic, we estimate that about a third of the Y chromosome has ceased recombining, containing 568 transcripts and spanning 22.3 cM in the corresponding female map. Nevertheless, we found limited evidence for Y-chromosome degeneration in terms of gene loss and pseudogenization, and most X- and Y-linked genes appear to have diverged in the period subsequent to speciation between M. annua and its sister species M. huetii which shares the same sex-determining region. Taken together, our results suggest that the M. annua Y chromosome has at least two evolutionary strata: a small old stratum shared with M. huetii, and a more recent larger stratum that is probably unique to M. annua and that stopped recombining about one million years ago. Patterns of gene expression within the non-recombining region are consistent with the idea that sexually antagonistic selection may have played a role in favoring suppressed recombination.Copyright © 2019, Genetics.
De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China.
Acer yangbiense is a newly described critically endangered endemic maple tree confined to Yangbi County in Yunnan Province in Southwest China. It was included in a programme for rescuing the most threatened species in China, focusing on “plant species with extremely small populations (PSESP)”.We generated 64, 94, and 110 Gb of raw DNA sequences and obtained a chromosome-level genome assembly of A. yangbiense through a combination of Pacific Biosciences Single-molecule Real-time, Illumina HiSeq X, and Hi-C mapping, respectively. The final genome assembly is ~666 Mb, with 13 chromosomes covering ~97% of the genome and scaffold N50 sizes of 45 Mb. Further, BUSCO analysis recovered 95.5% complete BUSCO genes. The total number of repetitive elements account for 68.0% of the A. yangbiense genome. Genome annotation generated 28,320 protein-coding genes, assisted by a combination of prediction and transcriptome sequencing. In addition, a nearly 1:1 orthology ratio of dot plots of longer syntenic blocks revealed a similar evolutionary history between A. yangbiense and grape, indicating that the genome has not undergone a whole-genome duplication event after the core eudicot common hexaploidization.Here, we report a high-quality de novo genome assembly of A. yangbiense, the first genome for the genus Acer and the family Aceraceae. This will provide fundamental conservation genomics resources, as well as representing a new high-quality reference genome for the economically important Acer lineage and the wider order of Sapindales. © The Author(s) 2019. Published by Oxford University Press.
Cultivated strawberry emerged from the hybridization of two wild octoploid species, both descendants from the merger of four diploid progenitor species into a single nucleus more than 1 million years ago. Here we report a near-complete chromosome-scale assembly for cultivated octoploid strawberry (Fragaria?×?ananassa) and uncovered the origin and evolutionary processes that shaped this complex allopolyploid. We identified the extant relatives of each diploid progenitor species and provide support for the North American origin of octoploid strawberry. We examined the dynamics among the four subgenomes in octoploid strawberry and uncovered the presence of a single dominant subgenome with significantly greater gene content, gene expression abundance, and biased exchanges between homoeologous chromosomes, as compared with the other subgenomes. Pathway analysis showed that certain metabolomic and disease-resistance traits are largely controlled by the dominant subgenome. These findings and the reference genome should serve as a powerful platform for future evolutionary studies and enable molecular breeding in strawberry.
The Genome of Cucurbita argyrosperma (Silver-Seed Gourd) Reveals Faster Rates of Protein-Coding Gene and Long Noncoding RNA Turnover and Neofunctionalization within Cucurbita.
Whole-genome duplications are an important source of evolutionary novelties that change the mode and tempo at which genetic elements evolve within a genome. The Cucurbita genus experienced a whole-genome duplication around 30 million years ago, although the evolutionary dynamics of the coding and noncoding genes in this genus have not yet been scrutinized. Here, we analyzed the genomes of four Cucurbita species, including a newly assembled genome of Cucurbita argyrosperma, and compared the gene contents of these species with those of five other members of the Cucurbitaceae family to assess the evolutionary dynamics of protein-coding and long intergenic noncoding RNA (lincRNA) genes after the genome duplication. We report that Cucurbita genomes have a higher protein-coding gene birth-death rate compared with the genomes of the other members of the Cucurbitaceae family. C. argyrosperma gene families associated with pollination and transmembrane transport had significantly faster evolutionary rates. lincRNA families showed high levels of gene turnover throughout the phylogeny, and 67.7% of the lincRNA families in Cucurbita showed evidence of birth from the neofunctionalization of previously existing protein-coding genes. Collectively, our results suggest that the whole-genome duplication in Cucurbita resulted in faster rates of gene family evolution through the neofunctionalization of duplicated genes. Copyright © 2019 The Author. Published by Elsevier Inc. All rights reserved.
Morella rubra, red bayberry, is an economically important fruit tree in south China. Here, we assembled the first high-quality genome for both a female and a male individual of red bayberry. The genome size was 313-Mb, and 90% sequences were assembled into eight pseudo chromosome molecules, with 32 493 predicted genes. By whole-genome comparison between the female and male and association analysis with sequences of bulked and individual DNA samples from female and male, a 59-Kb region determining female was identified and located on distal end of pseudochromosome 8, which contains abundant transposable element and seven putative genes, four of them are related to sex floral development. This 59-Kb female-specific region was likely to be derived from duplication and rearrangement of paralogous genes and retained non-recombinant in the female-specific region. Sex-specific molecular markers developed from candidate genes co-segregated with sex in a genetically diverse female and male germplasm. We propose sex determination follow the ZW model of female heterogamety. The genome sequence of red bayberry provides a valuable resource for plant sex chromosome evolution and also provides important insights for molecular biology, genetics and modern breeding in Myricaceae family. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Recent advances in genomics technologies have greatly accelerated the progress in both fundamental plant science and applied breeding research. Concurrently, high-throughput plant phenotyping is becoming widely adopted in the plant community, promising to alleviate the phenotypic bottleneck. While these technological breakthroughs are significantly accelerating quantitative trait locus (QTL) and causal gene identification, challenges to enable even more sophisticated analyses remain. In particular, care needs to be taken to standardize, describe and conduct experiments robustly while relying on plant physiology expertise. In this article, we review the state of the art regarding genome assembly and the future potential of pangenomics in plant research. We also describe the necessity of standardizing and describing phenotypic studies using the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard to enable the reuse and integration of phenotypic data. In addition, we show how deep phenotypic data might yield novel trait-trait correlations and review how to link phenotypic data to genomic data. Finally, we provide perspectives on the golden future of machine learning and their potential in linking phenotypes to genomic features. © 2018 The Authors The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.
Full-length transcriptome sequences obtained by a combination of sequencing platforms applied to heat shock proteins and polyunsaturated fatty acids biosynthesis in Pyropia haitanensis
Pyropia haitanensis is a high-yield commercial seaweed in China. Pyropia haitanensis farms often suffer from problems such as severe germplasm degeneration, while the mechanisms underlying resistance to abiotic stresses remain unknown because of lacking genomic information. Although many previous studies focused on using next-generation sequencing (NGS) technologies, the short-read sequences generated by NGS generally prevent the assembly of full-length transcripts, and then limit screening functional genes. In the present study, which was based on hybrid sequencing (NGS and single-molecular real-time sequencing) of the P. haitanensis thallus transcriptome, we obtained high-quality full-length transcripts with a mean length of 2998 bp and an N50 value of 3366 bp. A total of 14,773 unigenes (93.52%) were annotated in at least one database, while approximately 60% of all unigenes were assembled by short Illumina reads. Moreover, we herein suggested that the genes involved in the biosynthesis of polyunsaturated fatty acids and heat shock proteins play an important role in the process of development and resistance to abiotic stresses in P. haitanensis. The present study, together with previously published ones, may facilitate seaweed transcriptome research.
A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci.
Cannabis sativa is widely cultivated for medicinal, food, industrial, and recreational use, but much remains unknown regarding its genetics, including the molecular determinants of cannabinoid content. Here, we describe a combined physical and genetic map derived from a cross between the drug-type strain Purple Kush and the hemp variety “Finola.” The map reveals that cannabinoid biosynthesis genes are generally unlinked but that aromatic prenyltransferase (AP), which produces the substrate for THCA and CBDA synthases (THCAS and CBDAS), is tightly linked to a known marker for total cannabinoid content. We further identify the gene encoding CBCA synthase (CBCAS) and characterize its catalytic activity, providing insight into how cannabinoid diversity arises in cannabis. THCAS and CBDAS (which determine the drug vs. hemp chemotype) are contained within large (>250 kb) retrotransposon-rich regions that are highly nonhomologous between drug- and hemp-type alleles and are furthermore embedded within ~40 Mb of minimally recombining repetitive DNA. The chromosome structures are similar to those in grains such as wheat, with recombination focused in gene-rich, repeat-depleted regions near chromosome ends. The physical and genetic map should facilitate further dissection of genetic and molecular mechanisms in this commercially and medically important plant. © 2019 Laverty et al.; Published by Cold Spring Harbor Laboratory Press.
Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase-encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
The diploid strawberry Fragaria vesca serves as an ideal model plant for cultivated strawberry (Fragaria× ananassa, 8x) and the Rosaceae family. The F. vesca genome was initially published in 2011 using older technologies. Recently, a new and greatly improved F. vesca genome, designated V4, was published. However, the number of annotated genes is remarkably reduced in V4 (28,588 genes) compared to the prior annotations (32,831 to 33,673 genes). Additionally, the annotation of V4 (v4.0.a1) implements a new nomenclature for gene IDs (FvH4_XgXXXXX), rather than the previous nomenclature (geneXXXXX). Hence, further improvement of the V4 genome annotation and assigning gene expression levels under the new gene IDs with existing transcriptome data are necessary to facilitate the utility of this high-quality F. vesca genome V4. Here, we built a new and improved annotation, v4.0.a2, for F. vesca genome V4. The new annotation has a total of 34,007 gene models with 98.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCOs). In this v4.0.a2 annotation, gene models of 8,342 existing genes are modified, 9,029 new genes are added, and 10,176 genes possess alternatively spliced isoforms with an average of 1.90 transcripts per locus. Transcription factors/regulators and protein kinases are globally identified. Interestingly, the transcription factor family FAr-red-impaired Response 1 (FAR1) contains 82 genes in v4.0.a2 but only two members in v4.0.a1. Additionally, the expression levels of all genes in the new annotation across a total of 46 different tissues and stages are provided. Finally, miRNAs and their targets are reanalyzed and presented. Altogether, this work provides an updated genome annotation of the F. vesca V4 genome as well as a comprehensive gene expression atlas with the new gene ID nomenclature, which will greatly facilitate gene functional studies in strawberry and other evolutionarily related plant species.
The complexity of the Fragaria x ananassa (octoploid) transcriptome by single-molecule long-read sequencing.
Strawberry (Fragaria x ananassa) is an allopolyploid species with diverse and complex transcripts. The regulatory mechanisms of fruit development and maturation have been extensively studied; however, little is known about the signaling mechanisms that direct this process in octoploid strawberry (Fragaria x ananassa). Here, we used long-read sequencing (LRS) technology and RNA-seq analysis to investigate the diversity and complexity of the polyploid transcriptome and differentially expressed transcripts along four successive fruit developmental stages of cultivated strawberry. We obtained a reference transcriptome with 119,897 unique full-length isoforms, including 2017 new isoforms and 2510 long noncoding RNAs. Based on the genome of the plausible progenitor (Fragaria vesca), 20,229 alternative splicing (AS) events were identified. Using this transcriptome, we found 17,485 differentially expressed transcripts during strawberry fruit development, including 527 transcription factors (TFs) belonging to 41 families. The expression profiles of all members of the auxin, ABA pathway, and anthocyanin biosynthesis gene families were also examined, and many of them were highly expressed at the ripe fruit stage, strongly indicating that the role of those genes is in the regulation of fruit ripening. We produce a high-quality reference transcriptome for octoploid strawberry, including much of the full-length transcript diversity, to help understand the regulatory mechanisms of fruit development and maturation of polyploid species, particularly via elucidation of the biochemical pathways involved in auxin, ABA, and anthocyanin biosynthesis.
An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome.
Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome.Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously.This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.
The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63% of which are crop species. These genome sequences provide insight into architecture, evolution and novel aspects of crop genomes such as the retention of key agronomic traits after whole genome duplication events. Some crops have very large, polyploid, repeat-rich genomes, which require innovative strategies for sequencing, assembly and analysis. Even low quality reference genomes have the potential to improve crop germplasm through genome-wide molecular markers, which decrease expensive phenotyping and breeding cycles. The next stage of plant genomics will require draft genome refinement, building resources for crop wild relatives, resequencing broad diversity panels, and plant ENCODE projects to better understand the complexities of these highly diverse genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Background: Numerous completed or on-going whole genome sequencing projects have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been made in sequencing technologies, obtaining accurate and reliable data is still a challenge, both at the whole genome scale and when targeting specific genomic regions. These problems are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these problems, we have developed a strategy to reduce genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. Results: We compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We compared assembly results and highlighted differences due to the sequencing technologies used. Conclusions: We demonstrated that the long reads obtained with the PacBio RS II technology serve to obtain a better and more reliable assembly, notably by preventing errors due to duplicated or repetitive sequences in the same region.