Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Development of high-throughput sequencing techniques have greatly benefited our understanding about microbial ecology; yet the methods producing short reads suffer from species-level resolution and uncertainty of identification. Here we optimize PacBio-based metabarcoding protocols covering the Internal Transcribed Spacer (ITS region) and partial Small Subunit (SSU) of the rRNA gene for species-level identification of all eukaryotes, with a specific focus on Fungi (including Glomeromycota) and Stramenopila (particularly Oomycota). Based on tests on composite soil samples and mock communities, we propose best suitable degenerate primers, ITS9munngs + ITS4ngsUni for eukaryotes and selected groups therein and discuss pros and cons of long read-based identification of eukaryotes. This article is protected by copyright. All rights reserved.
Morphological and genomic characterisation of the hybrid schistosome infecting humans in Europe reveals a complex admixture between Schistosoma haematobium and Schistosoma bovis parasites
Schistosomes cause schistosomiasis, the worldtextquoterights second most important parasitic disease after malaria. A peculiar feature of schistosomes is their ability to produce viable and fertile hybrids. Originally only present in the tropics, schistosomiasis is now also endemic in Europe. Based on two genetic markers the European species had been identified as a hybrid between the ruminant-infective Schistosoma bovis and the human-infective Schistosoma haematobium.Here we describe for the first time the genomic composition of the European schistosome hybrid (77% of S. haematobium and 23% of S. bovis origins), its morphometric parameters and its compatibility with the European vector snail and intermediate host Compatibility is a key parameter for the parasites life cycle progression. We also show that egg morphology (a classical diagnostic parameter) does not allow for differential diagnosis while genetic tests do so. Additionally, we performed genome assembly improvement and annotation of S. bovis, the parental species for which no satisfactory genome assembly was available.For the first time since the discovery of hybrid schistosomes, these results reveal at the whole genomic level a complex admixture of parental genomes highlighting (i) the high permeability of schistosomes to other speciestextquoteright alleles, and (ii) the importance of hybrid formation for pushing species boundaries not only conceptionally but also geographically.
The ruminants are one of the most successful mammalian lineages, exhibiting morphological and habitat diversity and containing several key livestock species. To better understand their evolution, we generated and analyzed de novo assembled genomes of 44 ruminant species, representing all six Ruminantia families. We used these genomes to create a time-calibrated phylogeny to resolve topological controversies, overcoming the challenges of incomplete lineage sorting. Population dynamic analyses show that population declines commenced between 100,000 and 50,000 years ago, which is concomitant with expansion in human populations. We also reveal genes and regulatory elements that possibly contribute to the evolution of the digestive system, cranial appendages, immune system, metabolism, body size, cursorial locomotion, and dentition of the ruminants. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Xylella fastidiosa is an economically important bacterial plant pathogen. With insights gained from 72 genomes, this study investigated differences among the three main subspecies, which have allopatric origins: X. fastidiosa subsp. fastidiosa, multiplex, and pauca The origin of recombinogenic X. fastidiosa subsp. morus and sandyi was also assessed. The evolutionary rate of the 622 genes of the species core genome was estimated at the scale of an X. fastidiosa subsp. pauca subclade (7.62?×?10-7 substitutions per site per year), which was subsequently used to estimate divergence time for the subspecies and introduction events. The study characterized genes present in the accessory genome of each of the three subspecies and investigated the core genome to detect genes potentially under positive selection. Recombination is recognized to be the major driver of diversity in X. fastidiosa, potentially facilitating shifts to novel plant hosts. The relative effect of recombination in comparison to point mutation was calculated (r/m?=?2.259). Evidence of recombination was uncovered in the core genome alignment; X. fastidiosa subsp. fastidiosa in the United States was less prone to recombination, with an average of 3.22 of the 622 core genes identified as recombining regions, whereas a specific clade of X. fastidiosa subsp. multiplex was found to have on average 9.60 recombining genes, 93.2% of which originated from X. fastidiosa subsp. fastidiosa Interestingly, for X. fastidiosa subsp. morus, which was initially thought to be the outcome of genome-wide recombination between X. fastidiosa subsp. fastidiosa and X. fastidiosa subsp. multiplex, intersubspecies homologous recombination levels reached 15.30% in the core genome. Finally, there is evidence of X. fastidiosa subsp. pauca strains from citrus containing genetic elements acquired from strains infecting coffee plants as well as genetic elements from both X. fastidiosa subsp. fastidiosa and X. fastidiosa subsp. multiplex In summary, our data provide new insights into the evolution and epidemiology of this plant pathogen.IMPORTANCEXylella fastidiosa is an important vector-borne plant pathogen. We used a set of 72 genomes that constitutes the largest assembled data set for this bacterial species so far to investigate genetic relationships and the impact of recombination on phylogenetic clades and to compare genome content at the subspecies level, and we used a molecular dating approach to infer the evolutionary rate of X. fastidiosa The results demonstrate that recombination is important in shaping the genomes of X. fastidiosa and that each of the main subspecies is under different selective pressures. We hope insights from this study will improve our understanding of X. fastidiosa evolution and biology.Copyright © 2019 American Society for Microbiology.
De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China.
Acer yangbiense is a newly described critically endangered endemic maple tree confined to Yangbi County in Yunnan Province in Southwest China. It was included in a programme for rescuing the most threatened species in China, focusing on “plant species with extremely small populations (PSESP)”.We generated 64, 94, and 110 Gb of raw DNA sequences and obtained a chromosome-level genome assembly of A. yangbiense through a combination of Pacific Biosciences Single-molecule Real-time, Illumina HiSeq X, and Hi-C mapping, respectively. The final genome assembly is ~666 Mb, with 13 chromosomes covering ~97% of the genome and scaffold N50 sizes of 45 Mb. Further, BUSCO analysis recovered 95.5% complete BUSCO genes. The total number of repetitive elements account for 68.0% of the A. yangbiense genome. Genome annotation generated 28,320 protein-coding genes, assisted by a combination of prediction and transcriptome sequencing. In addition, a nearly 1:1 orthology ratio of dot plots of longer syntenic blocks revealed a similar evolutionary history between A. yangbiense and grape, indicating that the genome has not undergone a whole-genome duplication event after the core eudicot common hexaploidization.Here, we report a high-quality de novo genome assembly of A. yangbiense, the first genome for the genus Acer and the family Aceraceae. This will provide fundamental conservation genomics resources, as well as representing a new high-quality reference genome for the economically important Acer lineage and the wider order of Sapindales. © The Author(s) 2019. Published by Oxford University Press.
Yellowhorn (Xanthoceras sorbifolium) is a species of the Sapindaceae family native to China and is an oil tree that can withstand cold and drought conditions. A pseudomolecule-level genome assembly for this species will not only contribute to understanding the evolution of its genes and chromosomes but also bring yellowhorn breeding into the genomic era.Here, we generated 15 pseudomolecules of yellowhorn chromosomes, on which 97.04% of scaffolds were anchored, using the combined Illumina HiSeq, Pacific Biosciences Sequel, and Hi-C technologies. The length of the final yellowhorn genome assembly was 504.2 Mb with a contig N50 size of 1.04 Mb and a scaffold N50 size of 32.17 Mb. Genome annotation revealed that 68.67% of the yellowhorn genome was composed of repetitive elements. Gene modelling predicted 24,672 protein-coding genes. By comparing orthologous genes, the divergence time of yellowhorn and its close sister species longan (Dimocarpus longan) was estimated at ~33.07 million years ago. Gene cluster and chromosome synteny analysis demonstrated that the yellowhorn genome shared a conserved genome structure with its ancestor in some chromosomes.This genome assembly represents a high-quality reference genome for yellowhorn. Integrated genome annotations provide a valuable dataset for genetic and molecular research in this species. We did not detect whole-genome duplication in the genome. The yellowhorn genome carries syntenic blocks from ancient chromosomes. These data sources will enable this genome to serve as an initial platform for breeding better yellowhorn cultivars. © The Author(s) 2019. Published by Oxford University Press.
Pecan (Carya illinoinensis) and Chinese hickory (C. cathayensis) are important commercially cultivated nut trees in the genus Carya (Juglandaceae), with high nutritional value and substantial health benefits.We obtained >187.22 and 178.87 gigabases of sequence, and ~288× and 248× genome coverage, to a pecan cultivar (“Pawnee”) and a domesticated Chinese hickory landrace (ZAFU-1), respectively. The total assembly size is 651.31 megabases (Mb) for pecan and 706.43 Mb for Chinese hickory. Two genome duplication events before the divergence from walnut were found in these species. Gene family analysis highlighted key genes in biotic and abiotic tolerance, oil, polyphenols, essential amino acids, and B vitamins. Further analyses of reduced-coverage genome sequences of 16 Carya and 2 Juglans species provide additional phylogenetic perspective on crop wild relatives.Cooperative characterization of these valuable resources provides a window to their evolutionary development and a valuable foundation for future crop improvement. © The Author(s) 2019. Published by Oxford University Press.
Chromosome-scale genome assembly of kiwifruit Actinidia eriantha with single-molecule sequencing and chromatin interaction mapping.
Kiwifruit (Actinidia spp.) is a dioecious plant with fruits containing abundant vitamin C and minerals. A handful of kiwifruit species have been domesticated, among which Actinidiaeriantha is increasingly favored in breeding owing to its superior commercial traits. Recently, elite cultivars from A. eriantha have been successfully selected and further studies on their biology and breeding potential require genomic information, which is currently unavailable.We assembled a chromosome-scale genome sequence of A. eriantha cultivar White using single-molecular sequencing and chromatin interaction map-based scaffolding. The assembly has a total size of 690.6 megabases and an N50 of 21.7 megabases. Approximately 99% of the assembly were in 29 pseudomolecules corresponding to the 29 kiwifruit chromosomes. Forty-three percent of the A. eriantha genome are repetitive sequences, and the non-repetitive part encodes 42,988 protein-coding genes, of which 39,075 have homologues from other plant species or protein domains. The divergence time between A. eriantha and its close relative Actinidia chinensis is estimated to be 3.3 million years, and after diversification, 1,727 and 1,506 gene families are expanded and contracted in A. eriantha, respectively.We provide a high-quality reference genome for kiwifruit A. eriantha. This chromosome-scale genome assembly is substantially better than 2 published kiwifruit assemblies from A. chinensis in terms of genome contiguity and completeness. The availability of the A. eriantha genome provides a valuable resource for facilitating kiwifruit breeding and studies of kiwifruit biology. © The Author(s) 2019. Published by Oxford University Press.
Genome Sequencing of Cladobotryum protrusum Provides Insights into the Evolution and Pathogenic Mechanisms of the Cobweb Disease Pathogen on Cultivated Mushroom.
Cladobotryum protrusum is one of the mycoparasites that cause cobweb disease on cultivated edible mushrooms. However, the molecular mechanisms of evolution and pathogenesis of C. protrusum on mushrooms are largely unknown. Here, we report a high-quality genome sequence of C. protrusum using the single-molecule, real-time sequencing platform of PacBio and perform a comparative analysis with closely related fungi in the family Hypocreaceae. The C. protrusum genome, the first complete genome to be sequenced in the genus Cladobotryum, is 39.09 Mb long, with an N50 of 4.97 Mb, encoding 11,003 proteins. The phylogenomic analysis confirmed its inclusion in Hypocreaceae, with its evolutionary divergence time estimated to be ~170.1 million years ago. The genome encodes a large and diverse set of genes involved in secreted peptidases, carbohydrate-active enzymes, cytochrome P450 enzymes, pathogen?host interactions, mycotoxins, and pigments. Moreover, C. protrusum harbors arrays of genes with the potential to produce bioactive secondary metabolites and stress response-related proteins that are significant for adaptation to hostile environments. Knowledge of the genome will foster a better understanding of the biology of C. protrusum and mycoparasitism in general, as well as help with the development of effective disease control strategies to minimize economic losses from cobweb disease in cultivated edible mushrooms.
Divergent evolution in the genomes of closely related lacertids, Lacerta viridis and L. bilineata, and implications for speciation.
Lacerta viridis and Lacerta bilineata are sister species of European green lizards (eastern and western clades, respectively) that, until recently, were grouped together as the L. viridis complex. Genetic incompatibilities were observed between lacertid populations through crossing experiments, which led to the delineation of two separate species within the L. viridis complex. The population history of these sister species and processes driving divergence are unknown. We constructed the first high-quality de novo genome assemblies for both L. viridis and L. bilineata through Illumina and PacBio sequencing, with annotation support provided from transcriptome sequencing of several tissues. To estimate gene flow between the two species and identify factors involved in reproductive isolation, we studied their evolutionary history, identified genomic rearrangements, detected signatures of selection on non-coding RNA, and on protein-coding genes.Here we show that gene flow was primarily unidirectional from L. bilineata to L. viridis after their split at least 1.15 million years ago. We detected positive selection of the non-coding repertoire; mutations in transcription factors; accumulation of divergence through inversions; selection on genes involved in neural development, reproduction, and behavior, as well as in ultraviolet-response, possibly driven by sexual selection, whose contribution to reproductive isolation between these lacertid species needs to be further evaluated.The combination of short and long sequence reads resulted in one of the most complete lizard genome assemblies. The characterization of a diverse array of genomic features provided valuable insights into the demographic history of divergence among European green lizards, as well as key species differences, some of which are candidates that could have played a role in speciation. In addition, our study generated valuable genomic resources that can be used to address conservation-related issues in lacertids. © The Author(s) 2018. Published by Oxford University Press.
Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of the estimated genome size. The genome has a high proportion (45%) of repeat-derived DNA, particularly of the long interspersed element and long terminal repeat subtypes, and the expansion of these elements may explain some of the large size. We predicted 12,852 protein coding genes, showing a high level of conservation with related trematode species. The majority of proteins (80%) had homologs in the human liver fluke Opisthorchis viverrini, with an average sequence identity of 64.1%. Assembly of the P. westermani mitochondrial genome from long PacBio reads resulted in a single high-quality circularized 20.6 kb contig. The contig harbored a 6.9 kb region of non-coding repetitive DNA comprised of three distinct repeat units. Our results suggest that the region is highly polymorphic in P. westermani, possibly even within single worm isolates.The generated assembly represents the first Paragonimus genome sequence and will facilitate future molecular studies of this important, but neglected, parasite group.
Recent studies suggest that closely related species can accumulate substantial genetic and phenotypic differences despite ongoing gene flow, thus challenging traditional ideas regarding the genetics of speciation. Baboons (genus Papio) are Old World monkeys consisting of six readily distinguishable species. Baboon species hybridize in the wild, and prior data imply a complex history of differentiation and introgression. We produced a reference genome assembly for the olive baboon (Papio anubis) and whole-genome sequence data for all six extant species. We document multiple episodes of admixture and introgression during the radiation of Papio baboons, thus demonstrating their value as a model of complex evolutionary divergence, hybridization, and reticulation. These results help inform our understanding of similar cases, including modern humans, Neanderthals, Denisovans, and other ancient hominins.
A whole genome scan of SNP data suggests a lack of abundant hard selective sweeps in the genome of the broad host range plant pathogenic fungus Sclerotinia sclerotiorum.
The pathogenic fungus Sclerotinia sclerotiorum infects over 600 species of plant. It is present in numerous environments throughout the world and causes significant damage to many agricultural crops. Fragmentation and lack of gene flow between populations may lead to population sub-structure. Within discrete recombining populations, positive selection may lead to a ‘selective sweep’. This is characterised by an increase in frequency of a favourable allele leading to reduction in genotypic diversity in a localised genomic region due to the phenomenon of genetic hitchhiking. We aimed to assess whether isolates of S. sclerotiorum from around the world formed genotypic clusters associated with geographical origin and to determine whether signatures of population-specific positive selection could be detected. To do this, we sequenced the genomes of 25 isolates of S. sclerotiorum collected from four different continents-Australia, Africa (north and south), Europe and North America (Canada and the northen United States) and conducted SNP based analyses of population structure and selective sweeps. Among the 25 isolates, there was evidence for two major population clusters. One of these consisted of 11 isolates from Canada, the USA and France (population 1), and the other consisted of nine isolates from Australia and one from Morocco (population 2). The rest of the isolates were genotypic outliers. We found that there was evidence of outcrossing in these two populations based on linkage disequilibrium decay. However, only a single candidate selective sweep was observed, and it was present in population 2. This sweep was close to a Major Facilitator Superfamily transporter gene, and we speculate that this gene may have a role in nutrient uptake from the host. The low abundance of selective sweeps in the S. sclerotiorum genome contrasts the numerous examples in the genomes of other fungal pathogens. This may be a result of its slow rate of evolution and low effective recombination rate due to self-fertilisation and vegetative reproduction.