Menu
April 21, 2020  |  

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Convergent horizontal gene transfer and cross-talk of mobile nucleic acids in parasitic plants.

Horizontal gene transfer (HGT), the movement and genomic integration of DNA across species boundaries, is commonly associated with bacteria and other microorganisms, but functional HGT (fHGT) is increasingly being recognized in heterotrophic parasitic plants that obtain their nutrients and water from their host plants through direct haustorial feeding. Here, in the holoparasitic stem parasite Cuscuta, we identify 108?transcribed and probably functional HGT events in Cuscuta campestris and related species, plus 42?additional regions with host-derived transposon, pseudogene and non-coding sequences. Surprisingly, 18?Cuscuta fHGTs were acquired from the same gene families by independent HGT events in Orobanchaceae parasites, and the majority are highly expressed in the haustorial feeding structures in both lineages. Convergent retention and expression of HGT sequences suggests an adaptive role for specific additional genes in parasite biology. Between 16 and 20 of the transcribed HGT events are inferred as ancestral in Cuscuta based on transcriptome sequences from species across the phylogenetic range of the genus, implicating fHGT in the successful radiation of Cuscuta parasites. Genome sequencing of C. campestris supports transfer of genomic DNA-rather than retroprocessed RNA-as the mechanism of fHGT. Many of the C. campestris genes horizontally acquired are also frequent sources of 24-nucleotide small RNAs that are typically associated with RNA-directed DNA methylation. One HGT encoding a leucine-rich repeat protein kinase overlaps with a microRNA that has been shown to regulate host gene expression, suggesting that HGT-derived parasite small RNAs may function in the parasite-host interaction. This study enriches our understanding of HGT by describing a parasite-host system with unprecedented gene exchange that points to convergent evolution of HGT events and the functional importance of horizontally transferred coding and non-coding sequences.


April 21, 2020  |  

Convergent evolution of linked mating-type loci in basidiomycetes: an ancient fusion event that has stood the test of time

Sexual development is a key evolutionary innovation of eukaryotes. In many species, mating involves interaction between compatible mating partners that can undergo cell and nuclear fusion and subsequent steps of development including meiosis. Mating compatibility in fungi is governed by mating type determinants, which are localized at mating type (MAT) loci. In basidiomycetes, the ancestral state is hypothesized to be tetrapolar (bifactorial), with two genetically unlinked MAT loci containing homeodomain transcription factor genes (HD locus) and pheromone and pheromone receptor genes (P/R locus), respectively. Alleles at both loci must differ between mating partners for completion of sexual development. However, there are also basidiomycete species with bipolar (unifactorial) mating systems, which can arise through genomic linkage of the HD and P/R loci. In the order Tremellales, which is comprised of mostly yeast-like species, bipolarity is found only in the human pathogenic Cryptococcus species. Here, we describe the analysis of MAT loci from the Trichosporonales, a sister order to the Tremellales. We analyzed genome sequences from 29 strains that belong to 24 species, including two new genome sequences generated in this study. Interestingly, in all of the species analyzed, the MAT loci are fused and a single HD gene is present in each mating type. This is similar to the organization in the pathogenic Cryptococci, which also have linked MAT loci and carry only one HD gene per MAT locus instead of the usual two HD genes found in the vast majority of basidiomycetes. However, the HD and P/R allele combinations in the Trichosporonales are different from those in the pathogenic Cryptococcus species. The differences in allele combinations compared to the bipolar Cryptococci as well as the existence of tetrapolar Tremellales sister species suggest that fusion of the HD and P/R loci and differential loss of one of the two HD genes per MAT allele occurred independently in the Trichosporonales and pathogenic Cryptococci. This finding supports the hypothesis of convergent evolution at the molecular level towards fused mating-type regions in fungi, similar to previous findings in other fungal groups. Unlike the fused MAT loci in several other basidiomycete lineages though, the gene content and gene order within the fused MAT loci are highly conserved in the Trichosporonales, and there is no apparent suppression of recombination extending from the MAT loci to adjacent chromosomal regions, suggesting different mechanisms for the evolution of physically linked MAT loci in these groups.


April 21, 2020  |  

The Ptr1 locus of Solanum lycopersicoides confers resistance to race 1 strains of Pseudomonas syringae pv. tomato and to Ralstonia pseudosolanacearum by recognizing the type III effectors AvrRpt2/RipBN.

Race 1 strains of Pseudomonas syringae pv. tomato, which cause bacterial speck disease of tomato, are becoming increasingly common and no simply-inherited genetic resistance to such strains is known. We discovered that a locus in Solanum lycopersicoides, termed Pseudomonas tomato race 1 (Ptr1), confers resistance to race 1 Pst strains by detecting the activity of type III effector AvrRpt2. In Arabidopsis, AvrRpt2 degrades the RIN4 protein thereby activating RPS2-mediated immunity. Using site-directed mutagenesis of AvrRpt2 we found that, like RPS2, activation of Ptr1 requires AvrRpt2 proteolytic activity. Ptr1 also detected the activity of AvrRpt2 homologs from diverse bacteria including one in Ralstonia pseudosolanacearum. The genome sequence of S. lycopersicoides revealed no RPS2 homolog in the Ptr1 region. Ptr1 could play an important role in controlling bacterial speck disease and its future cloning may shed light on an example of convergent evolution for recognition of a widespread type III effector.


April 21, 2020  |  

Lateral transfers of large DNA fragments spread functional genes among grasses.

A fundamental tenet of multicellular eukaryotic evolution is that vertical inheritance is paramount, with natural selection acting on genetic variants transferred from parents to offspring. This lineal process means that an organism’s adaptive potential can be restricted by its evolutionary history, the amount of standing genetic variation, and its mutation rate. Lateral gene transfer (LGT) theoretically provides a mechanism to bypass many of these limitations, but the evolutionary importance and frequency of this process in multicellular eukaryotes, such as plants, remains debated. We address this issue by assembling a chromosome-level genome for the grass Alloteropsis semialata, a species surmised to exhibit two LGTs, and screen it for other grass-to-grass LGTs using genomic data from 146 other grass species. Through stringent phylogenomic analyses, we discovered 57 additional LGTs in the A. semialata nuclear genome, involving at least nine different donor species. The LGTs are clustered in 23 laterally acquired genomic fragments that are up to 170 kb long and have accumulated during the diversification of Alloteropsis. The majority of the 59 LGTs in A. semialata are expressed, and we show that they have added functions to the recipient genome. Functional LGTs were further detected in the genomes of five other grass species, demonstrating that this process is likely widespread in this globally important group of plants. LGT therefore appears to represent a potent evolutionary force capable of spreading functional genes among distantly related grass species. Copyright © 2019 the Author(s). Published by PNAS.


April 21, 2020  |  

Large-scale ruminant genome sequencing provides insights into their evolution and distinct traits.

The ruminants are one of the most successful mammalian lineages, exhibiting morphological and habitat diversity and containing several key livestock species. To better understand their evolution, we generated and analyzed de novo assembled genomes of 44 ruminant species, representing all six Ruminantia families. We used these genomes to create a time-calibrated phylogeny to resolve topological controversies, overcoming the challenges of incomplete lineage sorting. Population dynamic analyses show that population declines commenced between 100,000 and 50,000 years ago, which is concomitant with expansion in human populations. We also reveal genes and regulatory elements that possibly contribute to the evolution of the digestive system, cranial appendages, immune system, metabolism, body size, cursorial locomotion, and dentition of the ruminants. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020  |  

A microbial factory for defensive kahalalides in a tripartite marine symbiosis.

Chemical defense against predators is widespread in natural ecosystems. Occasionally, taxonomically distant organisms share the same defense chemical. Here, we describe an unusual tripartite marine symbiosis, in which an intracellular bacterial symbiont (“Candidatus Endobryopsis kahalalidefaciens”) uses a diverse array of biosynthetic enzymes to convert simple substrates into a library of complex molecules (the kahalalides) for chemical defense of the host, the alga Bryopsis sp., against predation. The kahalalides are subsequently hijacked by a third partner, the herbivorous mollusk Elysia rufescens, and employed similarly for defense. “Ca E. kahalalidefaciens” has lost many essential traits for free living and acts as a factory for kahalalide production. This interaction between a bacterium, an alga, and an animal highlights the importance of chemical defense in the evolution of complex symbioses.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020  |  

De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China.

Acer yangbiense is a newly described critically endangered endemic maple tree confined to Yangbi County in Yunnan Province in Southwest China. It was included in a programme for rescuing the most threatened species in China, focusing on “plant species with extremely small populations (PSESP)”.We generated 64, 94, and 110 Gb of raw DNA sequences and obtained a chromosome-level genome assembly of A. yangbiense through a combination of Pacific Biosciences Single-molecule Real-time, Illumina HiSeq X, and Hi-C mapping, respectively. The final genome assembly is ~666 Mb, with 13 chromosomes covering ~97% of the genome and scaffold N50 sizes of 45 Mb. Further, BUSCO analysis recovered 95.5% complete BUSCO genes. The total number of repetitive elements account for 68.0% of the A. yangbiense genome. Genome annotation generated 28,320 protein-coding genes, assisted by a combination of prediction and transcriptome sequencing. In addition, a nearly 1:1 orthology ratio of dot plots of longer syntenic blocks revealed a similar evolutionary history between A. yangbiense and grape, indicating that the genome has not undergone a whole-genome duplication event after the core eudicot common hexaploidization.Here, we report a high-quality de novo genome assembly of A. yangbiense, the first genome for the genus Acer and the family Aceraceae. This will provide fundamental conservation genomics resources, as well as representing a new high-quality reference genome for the economically important Acer lineage and the wider order of Sapindales. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

Chromulinavorax destructans, a pathogen of microzooplankton that provides a window into the enigmatic candidate phylum Dependentiae.

Members of the major candidate phylum Dependentiae (a.k.a. TM6) are widespread across diverse environments from showerheads to peat bogs; yet, with the exception of two isolates infecting amoebae, they are only known from metagenomic data. The limited knowledge of their biology indicates that they have a long evolutionary history of parasitism. Here, we present Chromulinavorax destructans (Strain SeV1) the first isolate of this phylum to infect a representative from a widespread and ecologically significant group of heterotrophic flagellates, the microzooplankter Spumella elongata (Strain CCAP 955/1). Chromulinavorax destructans has a reduced 1.2 Mb genome that is so specialized for infection that it shows no evidence of complete metabolic pathways, but encodes an extensive transporter system for importing nutrients and energy in the form of ATP from the host. Its replication causes extensive reorganization and expansion of the mitochondrion, effectively surrounding the pathogen, consistent with its dependency on the host for energy. Nearly half (44%) of the inferred proteins contain signal sequences for secretion, including many without recognizable similarity to proteins of known function, as well as 98 copies of proteins with an ankyrin-repeat domain; ankyrin-repeats are known effectors of host modulation, suggesting the presence of an extensive host-manipulation apparatus. These observations help to cement members of this phylum as widespread and diverse parasites infecting a broad range of eukaryotic microbes.


April 21, 2020  |  

Capacity to utilize raffinose dictates pneumococcal disease phenotype.

Streptococcus pneumoniae is commonly carried asymptomatically in the human nasopharynx, but it also causes serious and invasive diseases such as pneumonia, bacteremia, and meningitis, as well as less serious but highly prevalent infections such as otitis media. We have previously shown that closely related pneumococci (of the same capsular serotype and multilocus sequence type [ST]) can display distinct pathogenic profiles in mice that correlate with clinical isolation site (e.g., blood versus ear), suggesting stable niche adaptation within a clonal lineage. This has provided an opportunity to identify determinants of disease tropism. Genomic analysis identified 17 and 27 single nucleotide polymorphisms (SNPs) or insertions/deletions in protein coding sequences between blood and ear isolates of serotype 14 ST15 and serotype 3 ST180, respectively. SNPs in raffinose uptake and utilization genes (rafR or rafK) were detected in both serotypes/lineages. Ear isolates were consistently defective in growth in media containing raffinose as the sole carbon source, as well as in expression of raffinose pathway genes aga, rafG, and rafK, relative to their serotype/ST-matched blood isolates. Similar differences were also seen between serotype 23F ST81 blood and ear isolates. Analysis of rafR allelic exchange mutants of the serotype 14 ST15 blood and ear isolates demonstrated that the SNP in rafR was entirely responsible for their distinct in vitro phenotypes and was also the determinant of differential tropism for the lungs versus ear and brain in a mouse intranasal challenge model. These data suggest that the ability of pneumococci to utilize raffinose determines the nature of disease.IMPORTANCES. pneumoniae is a component of the commensal nasopharyngeal microflora of humans, but from this reservoir, it can progress to localized or invasive disease with a frequency that translates into massive global morbidity and mortality. However, the factors that govern the switch from commensal to pathogen, as well as those that determine disease tropism, are poorly understood. Here we show that capacity to utilize raffinose can determine the nature of the disease caused by a given pneumococcal strain. Moreover, our findings provide an interesting example of convergent evolution, whereby pneumococci belonging to two unrelated serotypes/lineages exhibit SNPs in separate genes affecting raffinose uptake and utilization that correlate with distinct pathogenic profiles in vivo This further underscores the critical role of differential carbohydrate metabolism in the pathogenesis of localized versus invasive pneumococcal disease. Copyright © 2019 Minhas et al.


April 21, 2020  |  

Genome of Crucihimalaya himalaica, a close relative of Arabidopsis, shows ecological adaptation to high altitude.

Crucihimalaya himalaica, a close relative of Arabidopsis and Capsella, grows on the Qinghai-Tibet Plateau (QTP) about 4,000 m above sea level and represents an attractive model system for studying speciation and ecological adaptation in extreme environments. We assembled a draft genome sequence of 234.72 Mb encoding 27,019 genes and investigated its origin and adaptive evolutionary mechanisms. Phylogenomic analyses based on 4,586 single-copy genes revealed that C. himalaica is most closely related to Capsella (estimated divergence 8.8 to 12.2 Mya), whereas both species form a sister clade to Arabidopsis thaliana and Arabidopsis lyrata, from which they diverged between 12.7 and 17.2 Mya. LTR retrotransposons in C. himalaica proliferated shortly after the dramatic uplift and climatic change of the Himalayas from the Late Pliocene to Pleistocene. Compared with closely related species, C. himalaica showed significant contraction and pseudogenization in gene families associated with disease resistance and also significant expansion in gene families associated with ubiquitin-mediated proteolysis and DNA repair. We identified hundreds of genes involved in DNA repair, ubiquitin-mediated proteolysis, and reproductive processes with signs of positive selection. Gene families showing dramatic changes in size and genes showing signs of positive selection are likely candidates for C. himalaica’s adaptation to intense radiation, low temperature, and pathogen-depauperate environments in the QTP. Loss of function at the S-locus, the reason for the transition to self-fertilization of C. himalaica, might have enabled its QTP occupation. Overall, the genome sequence of C. himalaica provides insights into the mechanisms of plant adaptation to extreme environments.Copyright © 2019 the Author(s). Published by PNAS.


April 21, 2020  |  

A New Species of the ?-Proteobacterium Francisella, F. adeliensis Sp. Nov., Endocytobiont in an Antarctic Marine Ciliate and Potential Evolutionary Forerunner of Pathogenic Species.

The study of the draft genome of an Antarctic marine ciliate, Euplotes petzi, revealed foreign sequences of bacterial origin belonging to the ?-proteobacterium Francisella that includes pathogenic and environmental species. TEM and FISH analyses confirmed the presence of a Francisella endocytobiont in E. petzi. This endocytobiont was isolated and found to be a new species, named F. adeliensis sp. nov.. F. adeliensis grows well at wide ranges of temperature, salinity, and carbon dioxide concentrations implying that it may colonize new organisms living in deeply diversified habitats. The F. adeliensis genome includes the igl and pdp gene sets (pdpC and pdpE excepted) of the Francisella pathogenicity island needed for intracellular growth. Consistently with an F. adeliensis ancient symbiotic lifestyle, it also contains a single insertion-sequence element. Instead, it lacks genes for the biosynthesis of essential amino acids such as cysteine, lysine, methionine, and tyrosine. In a genome-based phylogenetic tree, F. adeliensis forms a new early branching clade, basal to the evolution of pathogenic species. The correlations of this clade with the other clades raise doubts about a genuine free-living nature of the environmental Francisella species isolated from natural and man-made environments, and suggest to look at F. adeliensis as a pioneer in the Francisella colonization of eukaryotic organisms.


April 21, 2020  |  

In-depth analysis of the genome of Trypanosoma evansi, an etiologic agent of surra.

Trypanosoma evansi is the causative agent of the animal trypanosomiasis surra, a disease with serious economic burden worldwide. The availability of the genome of its closely related parasite Trypanosoma brucei allows us to compare their genetic and evolutionarily shared and distinct biological features. The complete genomic sequence of the T. evansi YNB strain was obtained using a combination of genomic and transcriptomic sequencing, de novo assembly, and bioinformatic analysis. The genome size of the T. evansi YNB strain was 35.2 Mb, showing 96.59% similarity in sequence and 88.97% in scaffold alignment with T. brucei. A total of 8,617 protein-coding genes, accounting for 31% of the genome, were predicted. Approximately 1,641 alternative splicing events of 820 genes were identified, with a majority mediated by intron retention, which represented a major difference in post-transcriptional regulation between T. evansi and T. brucei. Disparities in gene copy number of the variant surface glycoprotein, expression site-associated genes, microRNAs, and RNA-binding protein were clearly observed between the two parasites. The results revealed the genomic determinants of T. evansi, which encoded specific biological characteristics that distinguished them from other related trypanosome species.


April 21, 2020  |  

Multiple modes of convergent adaptation in the spread of glyphosate-resistant Amaranthus tuberculatus.

The selection pressure exerted by herbicides has led to the repeated evolution of herbicide resistance in weeds. The evolution of herbicide resistance on contemporary timescales in turn provides an outstanding opportunity to investigate key questions about the genetics of adaptation, in particular the relative importance of adaptation from new mutations, standing genetic variation, or geographic spread of adaptive alleles through gene flow. Glyphosate-resistant Amaranthus tuberculatus poses one of the most significant threats to crop yields in the Midwestern United States, with both agricultural populations and herbicide resistance only recently emerging in Canada. To understand the evolutionary mechanisms driving the spread of resistance, we sequenced and assembled the A. tuberculatus genome and investigated the origins and population genomics of 163 resequenced glyphosate-resistant and susceptible individuals from Canada and the United States. In Canada, we discovered multiple modes of convergent evolution: in one locality, resistance appears to have evolved through introductions of preadapted US genotypes, while in another, there is evidence for the independent evolution of resistance on genomic backgrounds that are historically nonagricultural. Moreover, resistance on these local, nonagricultural backgrounds appears to have occurred predominantly through the partial sweep of a single haplotype. In contrast, resistant haplotypes arising from the Midwestern United States show multiple amplification haplotypes segregating both between and within populations. Therefore, while the remarkable species-wide diversity of A. tuberculatus has facilitated geographic parallel adaptation of glyphosate resistance, more recently established agricultural populations are limited to adaptation in a more mutation-limited framework.Copyright © 2019 the Author(s). Published by PNAS.


April 21, 2020  |  

A siphonous macroalgal genome suggests convergent functions of homeobox genes in algae and land plants.

Genome evolution and development of unicellular, multinucleate macroalgae (siphonous algae) are poorly known, although various multicellular organisms have been studied extensively. To understand macroalgal developmental evolution, we assembled the ~26?Mb genome of a siphonous green alga, Caulerpa lentillifera, with high contiguity, containing 9,311 protein-coding genes. Molecular phylogeny using 107 nuclear genes indicates that the diversification of the class Ulvophyceae, including C. lentillifera, occurred before the split of the Chlorophyceae and Trebouxiophyceae. Compared with other green algae, the TALE superclass of homeobox genes, which expanded in land plants, shows a series of lineage-specific duplications in this siphonous macroalga. Plant hormone signalling components were also expanded in a lineage-specific manner. Expanded transport regulators, which show spatially different expression, suggest that the structural patterning strategy of a multinucleate cell depends on diversification of nuclear pore proteins. These results not only imply functional convergence of duplicated genes among green plants, but also provide insight into evolutionary roots of green plants. Based on the present results, we propose cellular and molecular mechanisms involved in the structural differentiation in the siphonous alga. © The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.