April 21, 2020  |  

Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes

As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise and result in differentiation between human genomes.Science, this issue p. eaax2083INTRODUCTIONCharacterizing genetic variants underlying local adaptations in human populations is one of the central goals of evolutionary research. Most studies have focused on adaptive single-nucleotide variants that either arose as new beneficial mutations or were introduced after interbreeding with our now-extinct relatives, including Neanderthals and Denisovans. The adaptive role of copy number variants (CNVs), another well-known form of genomic variation generated through deletions or duplications that affect more base pairs in the genome, is less well understood, despite evidence that such mutations are subject to stronger selective pressures.RATIONALEThis study focuses on the discovery of introgressed and adaptive CNVs that have become enriched in specific human populations. We combine whole-genome CNV calling and population genetic inference methods to discover CNVs and then assess signals of selection after controlling for demographic history. We examine 266 publicly available modern human genomes from the Simons Genome Diversity Project and genomes of three ancient homininstextemdasha Denisovan, a Neanderthal from the Altai Mountains in Siberia, and a Neanderthal from Croatia. We apply long-read sequencing methods to sequence-resolve complex CNVs of interest specifically in the Melanesianstextemdashan Oceanian population distributed from Papua New Guinea to as far east as the islands of Fiji and known to harbor some of the greatest amounts of Neanderthal and Denisovan ancestry.RESULTSConsistent with the hypothesis of archaic introgression outside Africa, we find a significant excess of CNV sharing between modern non-African populations and archaic hominins (P = 0.039). Among Melanesians, we observe an enrichment of CNVs with potential signals of positive selection (n = 37 CNVs), of which 19 CNVs likely introgressed from archaic hominins. We show that Melanesian-stratified CNVs are significantly associated with signals of positive selection (P = 0.0323). Many map near or within genes associated with metabolism (e.g., ACOT1 and ACOT2), development and cell cycle or signaling (e.g., TNFRSF10D and CDK11A and CDK11B), or immune response (e.g., IFNLR1). We characterize two of the largest and most complex CNVs on chromosomes 16p11.2 and 8p21.3 that introgressed from Denisovans and Neanderthals, respectively, and are absent from most other human populations. At chromosome 16p11.2, we sequence-resolve a large duplication of >383 thousand base pairs (kbp) that originated from Denisovans and introgressed into the ancestral Melanesian population 60,000 to 170,000 years ago. This large duplication occurs at high frequency (>79%) in diverse Melanesian groups, shows signatures of positive selection, and maps adjacent to Homo sapienstextendashspecific duplications that predispose to rearrangements associated with autism. On chromosome 8p21.3, we identify a Melanesian haplotype that carries two CNVs, a ~6-kbp deletion, and a ~38-kbp duplication, with a Neanderthal origin and that introgressed into non-Africans 40,000 to 120,000 years ago. This CNV haplotype occurs at high frequency (44%) and shows signals consistent with a partial selective sweep in Melanesians. Using long-read sequencing genomic and transcriptomic data, we reconstruct the structure and complex evolutionary history for these two CNVs and discover previously undescribed duplicated genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that show an excess of amino acid replacements consistent with the action of positive selection.CONCLUSIONOur results suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation that is absent from current reference genomes.Large adaptive-introgressed CNVs at chromosomes 8p21.3 and 16p11.2 in Melanesians.The magnifying glasses highlight structural differences between the archaic (top) and reference (bottom) genomes. Neanderthal (red) and Denisovan (blue) haplotypes encompassing large CNVs occur at high frequencies in Melanesians (44 and 79%, respectively) but are absent (black) in all non-Melanesians. These CNVs create positively selected genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that are absent from the reference genome.Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


April 21, 2020  |  

Parallels between natural selection in the cold-adapted crop-wild relative Tripsacum dactyloides and artificial selection in temperate adapted maize.

Artificial selection has produced varieties of domesticated maize that thrive in temperate climates around the world. However, the direct progenitor of maize, teosinte, is indigenous only to a relatively small range of tropical and subtropical latitudes and grows poorly or not at all outside of this region. Tripsacum, a sister genus to maize and teosinte, is naturally endemic to the majority of areas in the western hemisphere where maize is cultivated. A full-length reference transcriptome for Tripsacum dactyloides generated using long-read Iso-Seq data was used to characterize independent adaptation to temperate climates in this clade. Genes related to phospholipid biosynthesis, a critical component of cold acclimation in other cold-adapted plant lineages, were enriched among those genes experiencing more rapid rates of protein sequence evolution in T. dactyloides. In contrast with previous studies of parallel selection, we find that there is a significant overlap between the genes that were targets of artificial selection during the adaptation of maize to temperate climates and those that were targets of natural selection in temperate-adapted T. dactyloides. Genes related to growth, development, response to stimulus, signaling, and organelles were enriched in the set of genes identified as both targets of natural and artificial selection. © 2019 The Authors The Plant Journal © 2019 John Wiley & Sons Ltd.


April 21, 2020  |  

Genetic map-guided genome assembly reveals a virulence-governing minichromosome in the lentil anthracnose pathogen Colletotrichum lentis.

Colletotrichum lentis causes anthracnose, which is a serious disease on lentil and can account for up to 70% crop loss. Two pathogenic races, 0 and 1, have been described in the C. lentis population from lentil. To unravel the genetic control of virulence, an isolate of the virulent race 0 was sequenced at 1481-fold genomic coverage. The 56.10-Mb genome assembly consists of 50 scaffolds with N50 scaffold length of 4.89 Mb. A total of 11 436 protein-coding gene models was predicted in the genome with 237 coding candidate effectors, 43 secondary metabolite biosynthetic enzymes and 229 carbohydrate-active enzymes (CAZymes), suggesting a contraction of the virulence gene repertoire in C. lentis. Scaffolds were assigned to 10 core and two minichromosomes using a population (race 0 × race 1, n = 94 progeny isolates) sequencing-based, high-density (14 312 single nucleotide polymorphisms) genetic map. Composite interval mapping revealed a single quantitative trait locus (QTL), qClVIR-11, located on minichromosome 11, explaining 85% of the variability in virulence of the C. lentis population. The QTL covers a physical distance of 0.84 Mb with 98 genes, including seven candidate effector and two secondary metabolite genes. Taken together, the study provides genetic and physical evidence for the existence of a minichromosome controlling the C. lentis virulence on lentil. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.


April 21, 2020  |  

Iron-associated protein interaction networks reveal the key functional modules related to survival and virulence of Pasteurella multocida.

Pasteurella multocida causes respiratory infectious diseases in a multitude of birds and mammals. A number of virulence-associated genes were reported across different strains of P. multocida, including those involved in the iron transport and metabolism. Comparative iron-associated genes of P. multocida among different animal hosts towards their interaction networks have not been fully revealed. Therefore, this study aimed to identify the iron-associated genes from core- and pan-genomes of fourteen P. multocida strains and to construct iron-associated protein interaction networks using genome-scale network analysis which might be associated with the virulence. Results showed that these fourteen strains had 1587 genes in the core-genome and 3400 genes constituting their pan-genome. Out of these, 2651 genes associated with iron transport and metabolism were selected to construct the protein interaction networks and 361 genes were incorporated into the iron-associated protein interaction network (iPIN) consisting of nine different iron-associated functional modules. After comparing with the virulence factor database (VFDB), 21 virulence-associated proteins were determined and 11 of these belonged to the heme biosynthesis module. From this study, the core heme biosynthesis module and the core outer membrane hemoglobin receptor HgbA were proposed as candidate targets to design novel antibiotics and vaccines for preventing pasteurellosis across the serotypes or animal hosts for enhanced precision agriculture to ensure sustainability in food security. Copyright © 2018. Published by Elsevier Ltd.


April 21, 2020  |  

Complete Genome Sequence of Saccharospirillum mangrovi HK-33T Sheds Light on the Ecological Role of a Bacterium in Mangrove Sediment Environment.

We present the genome sequence of Saccharospirillum mangrovi HK-33T, isolated from a mangrove sediment sample in Haikou, China. The complete genome of S. mangrovi HK-33T consisted of a single-circular chromosome with the size of 3,686,911 bp as well as an average G?+?C content of 57.37%, and contained 3,383 protein-coding genes, 4 operons of 16S-23S-5S rRNA genes, and 52 tRNA genes. Genomic annotation indicated that the genome of S. mangrovi HK-33T had many genes related to oligosaccharide and polysaccharide degradation and utilization of polyhydroxyalkanoate. For nitrogen cycle, genes encoding nitrate and nitrite reductase, glutamate dehydrogenase, glutamate synthase, and glutamine synthetase could be found. For phosphorus cycle, genes related to polyphosphate kinases (ppk1 and ppk2), the high-affinity phosphate-specific transport (Pst) system, and the low-affinity inorganic phosphate transporter (pitA) were predicted. For sulfur cycle, cysteine synthase and type III acyl coenzyme A transferase (dddD) coding genes were searched out. This study provides evidence about carbon, nitrogen, phosphorus, and sulfur metabolic patterns of S. mangrovi HK-33T and broadens our understandings about ecological roles of this bacterium in the mangrove sediment environment.


April 21, 2020  |  

Plastid genomes from diverse glaucophyte genera reveal a largely conserved gene content and limited architectural diversity.

Plastid genome (ptDNA) data of Glaucophyta have been limited for many years to the genus Cyanophora. Here, we sequenced the ptDNAs of Gloeochaete wittrockiana, Cyanoptyche gloeocystis, Glaucocystis incrassata, and Glaucocystis sp. BBH. The reported sequences are the first genome-scale plastid data available for these three poorly studied glaucophyte genera. Although the Glaucophyta plastids appear morphologically “ancestral,” they actually bear derived genomes not radically different from those of red algae or viridiplants. The glaucophyte plastid coding capacity is highly conserved (112 genes shared) and the architecture of the plastid chromosomes is relatively simple. Phylogenomic analyses recovered Glaucophyta as the earliest diverging Archaeplastida lineage, but the position of viridiplants as the first branching group was not rejected by the approximately unbiased test. Pairwise distances estimated from 19 different plastid genes revealed that the highest sequence divergence between glaucophyte genera is frequently higher than distances between species of different classes within red algae or viridiplants. Gene synteny and sequence similarity in the ptDNAs of the two Glaucocystis species analyzed is conserved. However, the ptDNA of Gla. incrassata contains a 7.9-kb insertion not detected in Glaucocystis sp. BBH. The insertion contains ten open reading frames that include four coding regions similar to bacterial serine recombinases (two open reading frames), DNA primases, and peptidoglycan aminohydrolases. These three enzymes, often encoded in bacterial plasmids and bacteriophage genomes, are known to participate in the mobilization and replication of DNA mobile elements. It is therefore plausible that the insertion in Gla. incrassata ptDNA is derived from a DNA mobile element.


April 21, 2020  |  

Global-level population genomics reveals differential effects of geography and phylogeny on horizontal gene transfer in soil bacteria.

Although microorganisms are known to dominate Earth’s biospheres and drive biogeochemical cycling, little is known about the geographic distributions of microbial populations or the environmental factors that pattern those distributions. We used a global-level hierarchical sampling scheme to comprehensively characterize the evolutionary relationships and distributional limitations of the nitrogen-fixing bacterial symbionts of the crop chickpea, generating 1,027 draft whole-genome sequences at the level of bacterial populations, including 14 high-quality PacBio genomes from a phylogenetically representative subset. We find that diverse Mesorhizobium taxa perform symbiosis with chickpea and have largely overlapping global distributions. However, sampled locations cluster based on the phylogenetic diversity of Mesorhizobium populations, and diversity clusters correspond to edaphic and environmental factors, primarily soil type and latitude. Despite long-standing evolutionary divergence and geographic isolation, the diverse taxa observed to nodulate chickpea share a set of integrative conjugative elements (ICEs) that encode the major functions of the symbiosis. This symbiosis ICE takes 2 forms in the bacterial chromosome-tripartite and monopartite-with tripartite ICEs confined to a broadly distributed superspecies clade. The pairwise evolutionary relatedness of these elements is controlled as much by geographic distance as by the evolutionary relatedness of the background genome. In contrast, diversity in the broader gene content of Mesorhizobium genomes follows a tight linear relationship with core genome phylogenetic distance, with little detectable effect of geography. These results illustrate how geography and demography can operate differentially on the evolution of bacterial genomes and offer useful insights for the development of improved technologies for sustainable agriculture.


April 21, 2020  |  

Methylome and Metabolome Analyses Reveal Adaptive Mechanisms in Geobacter sulfurreducens Grown on Different Terminal Electron Acceptors.

The Geobacter species evolved respiratory versatility to utilize a wide range of terminal electron acceptors. To explore this adaptive mechanism, Fe(III) citrate, hydrous ferric oxide, and fumarate were selected as electron acceptors, and the methylome and metabolome of Geobacter sulfurreducens PCA grown on each electron acceptor were investigated via third-generation, single-molecule real-time DNA sequencing and gas chromatography/time-of-flight mass spectrometry-based metabolomics, respectively. Results showed that the patterns of 4-methylcytosine (m4C) and 6-methyladenine (m6A) modification, the concentrations of fatty acids (e.g., caprylic acid, capric acid, and squalene), and the activity of antioxidant enzymes (e.g., superoxide dismutase, catalase, and glutathione reductase) were all varied in different electron acceptor cultures. Moreover, genes (e.g., GSU0466 and GSU1467) with low expression levels generally had high methylation levels. These findings suggest that m4C and m6A modifications, fatty acids, and antioxidant enzymes all play a role in the adaptation of G. sulfurreducens to diverse electron acceptors, and DNA methylation may be involved in the adaptation mainly via gene expression regulation.


April 21, 2020  |  

Genomic Plasticity Mediated by Transposable Elements in the Plant Pathogenic Fungus Colletotrichum higginsianum.

Phytopathogen genomes are under constant pressure to change, as pathogens are locked in an evolutionary arms race with their hosts, where pathogens evolve effector genes to manipulate their hosts, whereas the hosts evolve immune components to recognize the products of these genes. Colletotrichum higginsianum (Ch), a fungal pathogen with no known sexual morph, infects Brassicaceae plants including Arabidopsis thaliana. Previous studies revealed that Ch differs in its virulence toward various Arabidopsis thaliana ecotypes, indicating the existence of coevolutionary selective pressures. However, between-strain genomic variations in Ch have not been studied. Here, we sequenced and assembled the genome of a Ch strain, resulting in a highly contiguous genome assembly, which was compared with the chromosome-level genome assembly of another strain to identify genomic variations between strains. We found that the two closely related strains vary in terms of large-scale rearrangements, the existence of strain-specific regions, and effector candidate gene sets and that these variations are frequently associated with transposable elements (TEs). Ch has a compartmentalized genome consisting of gene-sparse, TE-dense regions with more effector candidate genes and gene-dense, TE-sparse regions harboring conserved genes. Additionally, analysis of the conservation patterns and syntenic regions of effector candidate genes indicated that the two strains vary in their effector candidate gene sets because of de novo evolution, horizontal gene transfer, or gene loss after divergence. Our results reveal mechanisms for generating genomic diversity in this asexual pathogen, which are important for understanding its adaption to hosts. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020  |  

A High-Quality Draft Genome Sequence of Colletotrichum gloeosporioides sensu stricto SMCG1#C, a Causal Agent of Anthracnose on Cunninghamia lanceolata in China.

Colletotrichum has a broad host range and causes major yield losses of crops. The fungus Colletotrichum gloeosporioides is associated with anthracnose on Chinese fir. In this study, we present a high-quality draft genome sequence of C. gloeosporioides sensu stricto SMCG1#C, providing a reference genomic data for further research on anthracnose of Chinese fir and other hosts.


April 21, 2020  |  

Genetic basis for the establishment of endosymbiosis in Paramecium.

The single-celled ciliate Paramecium bursaria is an indispensable model for investigating endosymbiosis between protists and green-algal symbionts. To elucidate the mechanism of this type of endosymbiosis, we combined PacBio and Illumina sequencing to assemble a high-quality and near-complete macronuclear genome of P. bursaria. The genomic characteristics and phylogenetic analyses indicate that P. bursaria is the basal clade of the Paramecium genus. Through comparative genomic analyses with its close relatives, we found that P. bursaria encodes more genes related to nitrogen metabolism and mineral absorption, but encodes fewer genes involved in oxygen binding and N-glycan biosynthesis. A comparison of the transcriptomic profiles between P. bursaria with and without endosymbiotic Chlorella showed differential expression of a wide range of metabolic genes. We selected 32 most differentially expressed genes to perform RNA interference experiment in P. bursaria, and found that P. bursaria can regulate the abundance of their symbionts through glutamine supply. This study provides novel insights into Paramecium evolution and will extend our knowledge of the molecular mechanism for the induction of endosymbiosis between P. bursaria and green algae.


April 21, 2020  |  

A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis

Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.


April 21, 2020  |  

The Single-molecule long-read sequencing of Scylla paramamosain.

Scylla paramamosain is an important aquaculture crab, which has great economical and nutritional value. To the best of our knowledge, few full-length crab transcriptomes are available. In this study, a library composed of 12 different tissues including gill, hepatopancreas, muscle, cerebral ganglion, eyestalk, thoracic ganglia, intestine, heart, testis, ovary, sperm reservoir, and hemocyte was constructed and sequenced using Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology. A total of 284803 full-length non-chimeric reads were obtained, from which 79005 high-quality unique transcripts were obtained after error correction and sequence clustering and redundant. Additionally, a total of 52544 transcripts were annotated against protein database (NCBI nonredundant, Swiss-Prot, KOG, and KEGG database). A total of 23644 long non-coding RNAs (lncRNAs) and 131561 simple sequence repeats (SSRs) were identified. Meanwhile, the isoforms of many genes were also identified in this study. Our study provides a rich set of full-length cDNA sequences for S. paramamosain, which will greatly facilitate S. paramamosain research.


April 21, 2020  |  

Genomic Analyses Reveal Evidence of Independent Evolution, Demographic History, and Extreme Environment Adaptation of Tibetan Plateau Agaricus bisporus.

Agaricus bisporus distributed in the Tibetan Plateau of China has high-stress resistance that is valuable for breeding improvements. However, its evolutionary history, specialization, and adaptation to the extreme Tibetan Plateau environment are largely unknown. Here, we performed de novo genome sequencing of a representative Tibetan Plateau wild strain ABM and comparative genomic analysis with the reported European strain H97 and H39. The assembled ABM genome was 30.4 Mb in size, and comprised 8,562 protein-coding genes. The ABM genome shared highly conserved syntenic blocks and a few inversions with H97 and H39. The phylogenetic tree constructed by 1,276 single-copy orthologous genes in nine fungal species showed that the Tibetan Plateau and European A. bisporus diverged ~5.5 million years ago. Population genomic analysis using genome resequencing of 29 strains revealed that the Tibetan Plateau population underwent significant differentiation from the European and American populations and evolved independently, and the global climate changes critically shaped the demographic history of the Tibetan Plateau population. Moreover, we identified key genes that are related to the cell wall and membrane system, and the development and defense systems regulated A. bisporus adapting to the harsh Tibetan Plateau environment. These findings highlight the value of genomic data in assessing the evolution and adaptation of mushrooms and will enhance future genetic improvements of A. bisporus.


April 21, 2020  |  

The genome of broomcorn millet.

Broomcorn millet (Panicum miliaceum L.) is the most water-efficient cereal and one of the earliest domesticated plants. Here we report its high-quality, chromosome-scale genome assembly using a combination of short-read sequencing, single-molecule real-time sequencing, Hi-C, and a high-density genetic map. Phylogenetic analyses reveal two sets of homologous chromosomes that may have merged ~5.6 million years ago, both of which exhibit strong synteny with other grass species. Broomcorn millet contains 55,930 protein-coding genes and 339 microRNA genes. We find Paniceae-specific expansion in several subfamilies of the BTB (broad complex/tramtrack/bric-a-brac) subunit of ubiquitin E3 ligases, suggesting enhanced regulation of protein dynamics may have contributed to the evolution of broomcorn millet. In addition, we identify the coexistence of all three C4 subtypes of carbon fixation candidate genes. The genome sequence is a valuable resource for breeders and will provide the foundation for studying the exceptional stress tolerance as well as C4 biology.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.