Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Genome-wide association studies (GWAS) have identified many genomic loci associated with risk for schizophrenia, but unambiguous identification of the relationship between disease-associated variants and specific genes, and in particular their effect on risk conferring transcripts, has proven difficult. To better understand the specific molecular mechanism(s) at the schizophrenia locus in 11q25, we undertook cis expression quantitative trait loci (cis-eQTL) mapping for this 2 megabase genomic region using postmortem human brain samples. To comprehensively assess the effects of genetic risk upon local expression, we evaluated multiple transcript features: genes, exons, and exon-exon junctions in multiple brain regions-dorsolateral prefrontal cortex (DLPFC), hippocampus, and caudate. Genetic risk variants strongly associated with expression of SNX19 transcript features that tag multiple rare classes of SNX19 transcripts, whereas they only weakly affected expression of an exon-exon junction that tags the majority of abundant transcripts. The most prominent class of SNX19 risk-associated transcripts is predicted to be overexpressed, defined by an exon-exon splice junction between exons 8 and 10 (junc8.10) and that is predicted to encode proteins that lack the characteristic nexin C terminal domain. Risk alleles were also associated with either increased or decreased expression of multiple additional classes of transcripts. With RACE, molecular cloning, and long read sequencing, we found a number of novel SNX19 transcripts that further define the set of potential etiological transcripts. We explored epigenetic regulation of SNX19 expression and found that DNA methylation at CpG sites near the primary transcription start site and within exon 2 partially mediate the effects of risk variants on risk-associated expression. ATAC sequencing revealed that some of the most strongly risk-associated SNPs are located within a region of open chromatin, suggesting a nearby regulatory element is involved. These findings indicate a potentially complex molecular etiology, in which risk alleles for schizophrenia generate epigenetic alterations and dysregulation of multiple classes of SNX19 transcripts.
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from textquoteleftfinishedtextquoteright. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies.Results We employed three gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: six with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and three with new assemblies based on re-scaffolding or Pacific Biosciences long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: seven for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further seven with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi.Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our comparisons show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.ADADSEQAGOAGOUTI-basedAGOUTIannotated genome optimization using transcriptome information toolALNalignment-basedCAMSAcomparative analysis and merging of scaffold assemblies toolDPdynamic programmingFISHfluorescence in situ hybridizationGAGOS-ASMGOS-ASMGene order scaffold assemblerKbpkilobasepairsMbpmegabasepairsOSORTHOSTITCHPacBioPacific BiosciencesPBPacBio-basedPHYphysical-mapping-basedRNAseqRNA sequencingQTLquantitative trait lociSYNsynteny-based.
Brassica napus (AACC, 2n = 38) is an important oilseed crop grown worldwide. However, little is known about the population evolution of this species, the genomic difference between its major genetic groups, such as European and Asian rapeseed, and the impacts of historical large-scale introgression events on this young tetraploid. In this study, we reported the de novo assembly of the genome sequences of an Asian rapeseed (B. napus), Ningyou 7, and its four progenitors and compared these genomes with other available genomic data from diverse European and Asian cultivars. Our results showed that Asian rapeseed originally derived from European rapeseed but subsequently significantly diverged, with rapid genome differentiation after hybridization and intensive local selective breeding. The first historical introgression of B. rapa dramatically broadened the allelic pool but decreased the deleterious variations of Asian rapeseed. The second historical introgression of the double-low traits of European rapeseed (canola) has reshaped Asian rapeseed into two groups (double-low and double-high), accompanied by an increase in genetic load in the double-low group. This study demonstrates distinctive genomic footprints and deleterious SNP (single nucleotide polymorphism) variants for local adaptation by recent intra- and interspecies introgression events and provides novel insights for understanding the rapid genome evolution of a young allopolyploid crop. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
De Novo Sequencing and Hybrid Assembly of the Biofuel Crop Jatropha curcas L.: Identification of Quantitative Trait Loci for Geminivirus Resistance.
Jatropha curcas is an important perennial, drought tolerant plant that has been identified as a potential biodiesel crop. We report here the hybrid de novo genome assembly of J. curcas generated using Illumina and PacBio sequencing technologies, and identification of quantitative loci for Jatropha Mosaic Virus (JMV) resistance. In this study, we generated scaffolds of 265.7 Mbp in length, which correspond to 84.8% of the gene space, using Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Additionally, 96.4% of predicted protein-coding genes were captured in RNA sequencing data, which reconfirms the accuracy of the assembled genome. The genome was utilized to identify 12,103 dinucleotide simple sequence repeat (SSR) markers, which were exploited in genetic diversity analysis to identify genetically distinct lines. A total of 207 polymorphic SSR markers were employed to construct a genetic linkage map for JMV resistance, using an interspecific F2 mapping population involving susceptible J. curcas and resistant Jatropha integerrima as parents. Quantitative trait locus (QTL) analysis led to the identification of three minor QTLs for JMV resistance, and the same has been validated in an alternate F2 mapping population. These validated QTLs were utilized in marker-assisted breeding for JMV resistance. Comparative genomics of oil-producing genes across selected oil producing species revealed 27 conserved genes and 2986 orthologous protein clusters in Jatropha. This reference genome assembly gives an insight into the understanding of the complex genetic structure of Jatropha, and serves as source for the development of agronomically improved virus-resistant and oil-producing lines.
Finding the needle in a haystack: Mapping antifungal drug resistance in fungal pathogen by genomic approaches.
Fungi are ubiquitous on earth and are essential for the maintenance of the global ecological equilibrium. Despite providing benefits to living organisms, they can also target specific hosts and inflict damage. These fungal pathogens are known to affect, for example, plants and mam- mals and thus reduce crop production necessary to sustain food supply and cause mortality in humans and animals. Designing defenses against these fungi is essential for the control of food resources and human health. As far as fungal pathogens are concerned, the principal option has been the use of antifungal agents, also called fungicides when they are used in the environment.
The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.Copyright © 2019 Elsevier Ltd. All rights reserved.
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity. Copyright © 2018 Elsevier Inc. All rights reserved.
Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense.
Allotetraploid cotton species (Gossypium hirsutum and Gossypium barbadense) have long been cultivated worldwide for natural renewable textile fibers. The draft genome sequences of both species are available but they are highly fragmented and incomplete1-4. Here we report reference-grade genome assemblies and annotations for G. hirsutum accession Texas Marker-1 (TM-1) and G. barbadense accession 3-79 by integrating single-molecule real-time sequencing, BioNano optical mapping and high-throughput chromosome conformation capture techniques. Compared with previous assembled draft genomes1,3, these genome sequences show considerable improvements in contiguity and completeness for regions with high content of repeats such as centromeres. Comparative genomics analyses identify extensive structural variations that probably occurred after polyploidization, highlighted by large paracentric/pericentric inversions in 14 chromosomes. We constructed an introgression line population to introduce favorable chromosome segments from G. barbadense to G. hirsutum, allowing us to identify 13 quantitative trait loci associated with superior fiber quality. These resources will accelerate evolutionary and functional genomic studies in cotton and inform future breeding programs for fiber improvement.
Development of a Molecular Marker Linked to the A4 Locus and the Structure of HD Genes in Pleurotus eryngii
Allelic differences in A and B mating-type loci are a prerequisite for the progression of mating in the genus Pleurotus eryngii; thus, the crossing is hampered by this biological barrier in inbreeding. Molecular markers linked to mating types of P. eryngii KNR2312 were investigated with randomly amplified polymorphic DNA to enhance crossing efficiency. An A4-linked sequence was identified and used to find the adjacent genomic region with the entire motif of the A locus from a contig sequenced by PacBio. The sequence-characterized amplified region marker 7-2299 distinguished A4 mating-type monokaryons from KNR2312 and other strains. A BLAST search of flanked sequences revealed that the A4 locus had a general feature consisting of the putative HD1 and HD2 genes. Both putative HD transcription factors contain a homeodomain sequence and a nuclear localization sequence; however, valid dimerization motifs were found only in the HD1 protein. The ACAAT motif, which was reported to have relevance to sex determination, was found in the intergenic region. The SCAR marker could be applicable in the classification of mating types in the P. eryngii breeding program, and the A4 locus could be the basis for a multi-allele detection marker.
Sucrose accumulation and decreased photosynthesis are early symptoms of yellow canopy syndrome (YCS) in sugarcane (Saccharum spp.), and precede the visual yellowing of the leaves. To investigate broad-scale gene expression changes during YCS-onset, transcriptome analyses coupled to metabolome analyses were performed. Across leaf tissues, the greatest number of differentially expressed genes related to the chloroplast, and the metabolic processes relating to nitrogen and carbohydrates. Five genes represented 90% of the TPM (Transcripts Per Million) associated with the downregulation of transcription during YCS-onset, which included PSII D1 (PsbA). This differential expression was consistent with a feedback regulatory effect upon photosynthesis. Broad-scale gene expression analyses did not reveal a cause for leaf sugar accumulation during YCS-onset. Interestingly, the midrib showed the greatest accumulation of sugars, followed by symptomatic lamina. To investigate if phloem loading/reloading may be compromised on a gene expression level – to lead to leaf sucrose accumulation – sucrose transport-related proteins of SWEETs, Sucrose Transporters (SUTs), H+-ATPases and H+-pyrophosphatases (H+-PPases) were characterised from a sugarcane transcriptome and expression analysed. Two clusters of Type I H+-PPases, with one upregulated and the other downregulated, were evident. Although less pronounced, a similar pattern of change was observed for the H+-ATPases. The disaccharide transporting SWEETs were downregulated after visual symptoms were present, and a monosaccharide transporting SWEET upregulated preceding, as well as after, symptom development. SUT gene expression was the least responsive to YCS development. The results are consistent with a reduction of photoassimilate movement through the phloem leading to sucrose build-up in the leaf.
Full-length transcriptome sequences obtained by a combination of sequencing platforms applied to heat shock proteins and polyunsaturated fatty acids biosynthesis in Pyropia haitanensis
Pyropia haitanensis is a high-yield commercial seaweed in China. Pyropia haitanensis farms often suffer from problems such as severe germplasm degeneration, while the mechanisms underlying resistance to abiotic stresses remain unknown because of lacking genomic information. Although many previous studies focused on using next-generation sequencing (NGS) technologies, the short-read sequences generated by NGS generally prevent the assembly of full-length transcripts, and then limit screening functional genes. In the present study, which was based on hybrid sequencing (NGS and single-molecular real-time sequencing) of the P. haitanensis thallus transcriptome, we obtained high-quality full-length transcripts with a mean length of 2998 bp and an N50 value of 3366 bp. A total of 14,773 unigenes (93.52%) were annotated in at least one database, while approximately 60% of all unigenes were assembled by short Illumina reads. Moreover, we herein suggested that the genes involved in the biosynthesis of polyunsaturated fatty acids and heat shock proteins play an important role in the process of development and resistance to abiotic stresses in P. haitanensis. The present study, together with previously published ones, may facilitate seaweed transcriptome research.
Many of our major crop species are polyploids, containing more than one genome or set of chromosomes. Polyploid crops present unique challenges, including difficulties in genome assembly, in discriminating between multiple gene and sequence copies, and in genetic mapping, hindering use of genomic data for genetics and breeding. Polyploid genomes may also be more prone to containing structural variation, such as loss of gene copies or sequences (presence–absence variation) and the presence of genes or sequences in multiple copies (copy-number variation). Although the two main types of genomic structural variation commonly identified are presence–absence variation and copy-number variation, we propose that homeologous exchanges constitute a third major form of genomic structural variation in polyploids. Homeologous exchanges involve the replacement of one genomic segment by a similar copy from another genome or ancestrally duplicated region, and are known to be extremely common in polyploids. Detecting all kinds of genomic structural variation is challenging, but recent advances such as optical mapping and long-read sequencing offer potential strategies to help identify structural variants even in complex polyploid genomes. All three major types of genomic structural variation (presence–absence, copy-number, and homeologous exchange) are now known to influence phenotypes in crop plants, with examples of flowering time, frost tolerance, and adaptive and agronomic traits. In this review, we summarize the challenges of genome analysis in polyploid crops, describe the various types of genomic structural variation and the genomics technologies and data that can be used to detect them, and collate information produced to date related to the impact of genomic structural variation on crop phenotypes. We highlight the importance of genomic structural variation for the future genetic improvement of polyploid crops.
Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase-encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Crop domestication changed the course of human evolution, and domestication of maize (Zea mays L. subspecies mays), today the world’s most important crop, enabled civilizations to flourish and has played a major role in shaping the world we know today. Archaeological and ethnobotanical research help us understand the development of the cultures and the movements of the peoples who carried maize to new areas where it continued to adapt. Ancient remains of maize cobs and kernels have been found in the place of domestication, the Balsas River Valley (~9,000 years before present era), and the cultivation center, the Tehuacan Valley (~5,000 years before present era), and have been used to study the process of domestication. Paleogenomic data showed that some of the genes controlling the stem and inflorescence architecture were comparable to modern maize, while other genes controlling ear shattering and starch biosynthesis retain high levels of variability, similar to those found in the wild relative teosinte. These results indicate that the domestication process was both gradual and complex, where different genetic loci were selected at different points in time, and that the transformation of teosinte to maize was completed in the last 5,000 years. Mesoamerican native cultures domesticated teosinte and developed maize from a 6 cm long, popping-kernel ear to what we now recognize as modern maize with its wide variety in ear size, kernel texture, color, size, and adequacy for diverse uses and also invented nixtamalization, a process key to maximizing its nutrition. Used directly for human and animal consumption, processed food products, bioenergy, and many cultural applications, it is now grown on six of the world’s seven continents. The study of its evolution and domestication from the wild grass teosinte helps us understand the nature of genetic diversity of maize and its wild relatives and gene expression. Genetic barriers to direct use of teosinte or Tripsacum in maize breeding have challenged our ability to identify valuable genes and traits, let alone incorporate them into elite, modern varieties. Genomic information and newer genetic technologies will facilitate the use of wild relatives in crop improvement; hence it is more important than ever to ensure their conservation and availability, fundamental to future food security. In situ conservation efforts dedicated to preserving remnant populations of wild relatives in Mexico are key to safeguarding the genetic diversity of maize and its genepool, as well as enabling these species to continue to adapt to dynamic climate and environmental changes. Genebank ex situ efforts are crucial to securely maintain collected wild relative resources and to provide them for gene discovery and other research efforts.