Menu
July 7, 2019  |  

The evolution and population diversity of human-specific segmental duplications

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (N?=?80 genes from 33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed ‘core duplicons’ and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (such as TCAF1/TCAF2), we highlight ten gene families (for example, ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.


July 7, 2019  |  

A vast genomic deletion in the C56BL/6 genome affects different genes within the Ifi200 cluster on chromosome 1 and mediates obesity and insulin resistance.

Obesity, the excessive accumulation of body fat, is a highly heritable and genetically heterogeneous disorder. The complex, polygenic basis for the disease consisting of a network of different gene variants is still not completely known.In the current study we generated a BAC library of the obese-prone NZO strain to clarify the genomic alteration within the gene cluster Ifi200 on chr.1 including Ifi202b, an obesity gene that is in contrast to NZO not expressed in the lean B6 mouse. With the PacBio sequencing data of NZO BAC clones we identified a deletion spanning approximately 261.8 kb in the B6 reference genome. The deletion affects different members of the Ifi200 gene family which also includes the original first exon and 5′-regulatory parts of the Ifi202b gene and suggests to be the relevant cause of its expression deficiency in B6. In addition, the generation and characterization of congenic mice carrying the critical fragment on the B6 background demonstrate its crucial role for obesity and insulin resistance.Our data reveal the reconstruction of a complex genomic region on mouse chr.1 resulting from deletions and duplications of Ifi200 genes and suggest to be relevant for the development of obesity. The results further demonstrate the complexity of the disease and highlight the importance for studying rare genetic variants as they can be causal for large effects.


July 7, 2019  |  

Epigenetic origin of evolutionary novel centromeres.

Most evolutionary new centromeres (ENC) are composed of large arrays of satellite DNA and surrounded by segmental duplications. However, the hypothesis is that ENCs are seeded in an anonymous sequence and only over time have acquired the complexity of “normal” centromeres. Up to now evidence to test this hypothesis was lacking. We recently discovered that the well-known polymorphism of orangutan chromosome 12 was due to the presence of an ENC. We sequenced the genome of an orangutan homozygous for the ENC, and we focused our analysis on the comparison of the ENC domain with respect to its wild type counterpart. No significant variations were found. This finding is the first clear evidence that ENC seedings are epigenetic in nature. The compaction of the ENC domain was found significantly higher than the corresponding WT region and, interestingly, the expression of the only gene embedded in the region was significantly repressed.


July 7, 2019  |  

Patterns of polymorphism at the self-incompatibility locus in 1,083 Arabidopsis thaliana genomes.

Although the transition to selfing in the model plant Arabidopsis thaliana involved the loss of the self-incompatibility (SI) system, it clearly did not occur due to the fixation of a single inactivating mutation at the locus determining the specificities of SI (the S-locus). At least three groups of divergent haplotypes (haplogroups), corresponding to ancient functional S-alleles, have been maintained at this locus, and extensive functional studies have shown that all three carry distinct inactivating mutations. However, the historical process of loss of SI is not well understood, in particular its relation with the last glaciation. Here, we took advantage of recently published genomic resequencing data in 1,083 Arabidopsis thaliana accessions that we combined with BAC sequencing to obtain polymorphism information for the whole S-locus region at a species-wide scale. The accessions differed by several major rearrangements including large deletions and interhaplogroup recombinations, forming a set of haplogroups that are widely distributed throughout the native range and largely overlap geographically. “Relict” A. thaliana accessions that directly derive from glacial refugia are polymorphic at the S-locus, suggesting that the three haplogroups were already present when glacial refugia from the last Ice Age became isolated. Interhaplogroup recombinant haplotypes were highly frequent, and detailed analysis of recombination breakpoints suggested multiple independent origins. These findings suggest that the complete loss of SI in A. thaliana involved independent self-compatible mutants that arose prior to the last Ice Age, and experienced further rearrangements during postglacial colonization.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019  |  

The unique genomic landscape surrounding the EPSPS gene in glyphosate resistant Amaranthus palmeri: a repetitive path to resistance.

The expanding number and global distributions of herbicide resistant weedy species threaten food, fuel, fiber and bioproduct sustainability and agroecosystem longevity. Amongst the most competitive weeds, Amaranthus palmeri S. Wats has rapidly evolved resistance to glyphosate primarily through massive amplification and insertion of the 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) gene across the genome. Increased EPSPS gene copy numbers results in higher titers of the EPSPS enzyme, the target of glyphosate, and confers resistance to glyphosate treatment. To understand the genomic unit and mechanism of EPSPS gene copy number proliferation, we developed and used a bacterial artificial chromosome (BAC) library from a highly resistant biotype to sequence the local genomic landscape flanking the EPSPS gene.By sequencing overlapping BACs, a 297 kb sequence was generated, hereafter referred to as the “EPSPS cassette.” This region included several putative genes, dense clusters of tandem and inverted repeats, putative helitron and autonomous replication sequences, and regulatory elements. Whole genome shotgun sequencing (WGS) of two biotypes exhibiting high and no resistance to glyphosate was performed to compare genomic representation across the EPSPS cassette. Mapping of sequences for both biotypes to the reference EPSPS cassette revealed significant differences in upstream and downstream sequences relative to EPSPS with regard to both repetitive units and coding content between these biotypes. The differences in sequence may have resulted from a compounded-building mechanism such as repetitive transpositional events. The association of putative helitron sequences with the cassette suggests a possible amplification and distribution mechanism. Flow cytometry revealed that the EPSPS cassette added measurable genomic content.The adoption of glyphosate resistant cropping systems in major crops such as corn, soybean, cotton and canola coupled with excessive use of glyphosate herbicide has led to evolved glyphosate resistance in several important weeds. In Amaranthus palmeri, the amplification of the EPSPS cassette, characterized by a complex array of repetitive elements and putative helitron sequences, suggests an adaptive structural genomic mechanism that drives amplification and distribution around the genome. The added genomic content not found in glyphosate sensitive plants may be driving evolution through genome expansion.


July 7, 2019  |  

Complete gene sequence of spider attachment silk protein (PySp1) reveals novel linker regions and extreme repeat homogenization.

Spiders use a myriad of silk types for daily survival, and each silk type has a unique suite of task-specific mechanical properties. Of all spider silk types, pyriform silk is distinct because it is a combination of a dry protein fiber and wet glue. Pyriform silk fibers are coated with wet cement and extruded into “attachment discs” that adhere silks to each other and to substrates. The mechanical properties of spider silk types are linked to the primary and higher-level structures of spider silk proteins (spidroins). Spidroins are often enormous molecules (>250 kDa) and have a lengthy repetitive region that is flanked by relatively short (~100 amino acids), non-repetitive amino- and carboxyl-terminal regions. The amino acid sequence motifs in the repetitive region vary greatly between spidroin type, while motif length and number underlie the remarkable mechanical properties of spider silk fibers. Existing knowledge of pyriform spidroins is fragmented, making it difficult to define links between the structure and function of pyriform spidroins. Here, we present the full-length sequence of the gene encoding pyriform spidroin 1 (PySp1) from the silver garden spider Argiope argentata. The predicted protein is similar to previously reported PySp1 sequences but the A. argentata PySp1 has a uniquely long and repetitive “linker”, which bridges the amino-terminal and repetitive regions. Predictions of the hydrophobicity and secondary structure of A. argentata PySp1 identify regions important to protein self-assembly. Analysis of the full complement of A. argentata PySp1 repeats reveals extreme intragenic homogenization, and comparison of A. argentata PySp1 repeats with other PySp1 sequences identifies variability in two sub-repetitive expansion regions. Overall, the full-length A. argentata PySp1 sequence provides new evidence for understanding how pyriform spidroins contribute to the properties of pyriform silk fibers. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.


July 7, 2019  |  

Identification of a gene cluster for telomestatin biosynthesis and heterologous expression using a specific promoter in a clean host.

Telomestatin, a strong telomerase inhibitor with G-quadruplex stabilizing activity, is a potential therapeutic agent for treating cancers. Difficulties in isolating telomestatin from microbial cultures and in chemical synthesis are bottlenecks impeding the wider use. Therefore, improvement in telomestatin production and structural diversification are required for further utilization and application. Here, we discovered the gene cluster responsible for telomestatin biosynthesis, and achieved production of telomestatin by heterologous expression of this cluster in the engineered Streptomyces avermitilis SUKA strain. Utilization of an optimal promoter was essential for successful production. Gene disruption studies revealed that the tlsB, tlsC, and tlsO-T genes play key roles in telomestatin biosynthesis. Moreover, exchanging TlsC core peptide sequences resulted in the production of novel telomestatin derivatives. This study sheds light on the expansion of chemical diversity of natural peptide products for drug development.


July 7, 2019  |  

Loss of pollen-specific phospholipase NOT LIKE DAD triggers gynogenesis in maize.

Gynogenesis is an asexual mode of reproduction common to animals and plants, in which stimuli from the sperm cell trigger the development of the unfertilized egg cell into a haploid embryo. Fine mapping restricted a major maize QTL (quantitative trait locus) responsible for the aptitude of inducer lines to trigger gynogenesis to a zone containing a single gene NOT LIKE DAD (NLD) coding for a patatin-like phospholipase A. In all surveyed inducer lines, NLD carries a 4-bp insertion leading to a predicted truncated protein. This frameshift mutation is responsible for haploid induction because complementation with wild-type NLD abolishes the haploid induction capacity. Activity of the NLD promoter is restricted to mature pollen and pollen tube. The translational NLD::citrine fusion protein likely localizes to the sperm cell plasma membrane. In Arabidopsis roots, the truncated protein is no longer localized to the plasma membrane, contrary to the wild-type NLD protein. In conclusion, an intact pollen-specific phospholipase is required for successful sexual reproduction and its targeted disruption may allow establishing powerful haploid breeding tools in numerous crops.© 2017 The Authors.


July 7, 2019  |  

Butterfly genomics: insights from the genome of Melitaea cinxia

The first lepidopteran genome (Bombyx mori) was published in 2004. Ten years later the genome of Melitaea cinxia came out as the third butterfly genome published, and the first eukaryotic genome sequenced in Finland. Owing to Ilkka Hanski, the M. cinxia system in the Åland Islands has become a famous model for metapopulation biology. More than 20 years of research on this system provides a strong ecological basis upon which a genetic framework could be built. Genetic knowledge is an essential addition for understanding eco-evolutionary dynamics and the genetic basis of variability in life history traits. Here we review the process of the M. cinxia genome project, its implications for lepidopteran genome evolution, and describe how the genome has been used for gene expression studies to identify genetic consequences of habitat fragmentation. Finally, we introduce some future possibilities and challenges for genomic research in M. cinxia and other Lepidoptera.


July 7, 2019  |  

Genome sequencing reveals the origin of the allotetraploid Arabidopsis suecica.

Polyploidy is an example of instantaneous speciation when it involves the formation of a new cytotype that is incompatible with the parental species. Because new polyploid individuals are likely to be rare, establishment of a new species is unlikely unless polyploids are able to reproduce through self-fertilization (selfing), or asexually. Conversely, selfing (or asexuality) makes it possible for polyploid species to originate from a single individual-a bona fide speciation event. The extent to which this happens is not known. Here, we consider the origin of Arabidopsis suecica, a selfing allopolyploid between Arabidopsis thaliana and Arabidopsis arenosa, which has hitherto been considered to be an example of a unique origin. Based on whole-genome re-sequencing of 15 natural A. suecica accessions, we identify ubiquitous shared polymorphism with the parental species, and hence conclusively reject a unique origin in favor of multiple founding individuals. We further estimate that the species originated after the last glacial maximum in Eastern Europe or central Eurasia (rather than Sweden, as the name might suggest). Finally, annotation of the self-incompatibility loci in A. suecica revealed that both loci carry non-functional alleles. The locus inherited from the selfing A. thaliana is fixed for an ancestral non-functional allele, whereas the locus inherited from the outcrossing A. arenosa is fixed for a novel loss-of-function allele. Furthermore, the allele inherited from A. thaliana is predicted to transcriptionally silence the allele inherited from A. arenosa, suggesting that loss of self-incompatibility may have been instantaneous.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019  |  

Hybrid assembly with long and short reads improves discovery of gene family expansions.

Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation.We developed a hybrid assembly pipeline called “Alpaca” that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation.Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies.Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.


July 7, 2019  |  

Genome-wide identification of the mutation underlying fleece variation and discriminating ancestral hairy species from modern woolly sheep.

The composition and structure of fleece variation observed in mammals is a consequence of a strong selective pressure for fiber production after domestication. In sheep, fleece variation discriminates ancestral species carrying a long and hairy fleece from modern domestic sheep (Ovis aries) owning a short and woolly fleece. Here, we report that the “woolly” allele results from the insertion of an antisense EIF2S2 retrogene (called asEIF2S2) into the 3′ UTR of the IRF2BP2 gene leading to an abnormal IRF2BP2 transcript. We provide evidence that this chimeric IRF2BP2/asEIF2S2 messenger 1) targets the genuine sense EIF2S2 RNA and 2) creates a long endogenous double-stranded RNA which alters the expression of both EIF2S2 and IRF2BP2 mRNA. This represents a unique example of a phenotype arising via a RNA-RNA hybrid, itself generated through a retroposition mechanism. Our results bring new insights on the sheep population history thanks to the identification of the molecular origin of an evolutionary phenotypic variation.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019  |  

Sequencing the genomic regions flanking S-linked PvGLO sequences confirms the presence of two GLO loci, one of which lies adjacent to the style-length determinant gene CYP734A50.

Primula vulgaris contains two GLOBOSA loci, one located adjacent to the style length determinant gene CYP734A50 which lies within the S -locus. Using a combination of BAC walking and PacBio sequencing, we have sequenced two substantial genomic contigs in and around the S-locus of Primula vulgaris. Using these data, we were able to demonstrate that two alleles of PvGlo (P) as well as PvGlo (T) can be present in the genome of a single plant, providing empirical evidence that these two forms of the MADS-box gene GLOBOSA are separate loci and not allelic as previously reported. We propose they should be renamed PvGLO1 and PvGLO2. BAC contigs extending from each GLOBOSA locus were identified and fully sequenced. No homologous genes were found between the contigs other than the GLOBOSA genes themselves, consistent with their identity as separate loci. Exons of the recently identified style-length determinant gene CYP734A50 were identified on one end of the contig containing PvGLO2 and these genes are adjacent in the genome, suggesting that PvGLO2 lies either within or at least very close to the S-locus. Current evidence suggests that both CYP734A50 and GLO2 are specific to the S-morph mating type and are hemizygous rather than heterozygous in the Primula genome. This finding contrasts classical models of the HSI locus, which propose that components of the S-locus are allelic, suggesting that these models may need to be reconsidered.


July 7, 2019  |  

The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation.

Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination.


July 7, 2019  |  

Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments.

Pearl millet [Cenchrus americanus (L.) Morrone] is a staple food for more than 90 million farmers in arid and semi-arid regions of sub-Saharan Africa, India and South Asia. We report the ~1.79 Gb draft whole genome sequence of reference genotype Tift 23D2B1-P1-P5, which contains an estimated 38,579 genes. We highlight the substantial enrichment for wax biosynthesis genes, which may contribute to heat and drought tolerance in this crop. We resequenced and analyzed 994 pearl millet lines, enabling insights into population structure, genetic diversity and domestication. We use these resequencing data to establish marker trait associations for genomic selection, to define heterotic pools, and to predict hybrid performance. We believe that these resources should empower researchers and breeders to improve this important staple crop.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.