April 21, 2020  |  

De novo assembly of a wild pear (Pyrus betuleafolia) genome.

China is the origin and evolutionary centre of Oriental pears. Pyrus betuleafolia is a wild species native to China and distributed in the northern region, and it is widely used as rootstock. Here, we report the de novo assembly of the genome of P. betuleafolia-Shanxi Duli using an integrated strategy that combines PacBio sequencing, BioNano mapping and chromosome conformation capture (Hi-C) sequencing. The genome assembly size was 532.7 Mb, with a contig N50 of 1.57 Mb. A total of 59 552 protein-coding genes and 247.4 Mb of repetitive sequences were annotated for this genome. The expansion genes in P. betuleafolia were significantly enriched in secondary metabolism, which may account for the organism’s considerable environmental adaptability. An alignment analysis of orthologous genes showed that fruit size, sugar metabolism and transport, and photosynthetic efficiency were positively selected in Oriental pear during domestication. A total of 573 nucleotide-binding site (NBS)-type resistance gene analogues (RGAs) were identified in the P. betuleafolia genome, 150 of which are TIR-NBS-LRR (TNL)-type genes, which represented the greatest number of TNL-type genes among the published Rosaceae genomes and explained the strong disease resistance of this wild species. The study of flavour metabolism-related genes showed that the anthocyanidin reductase (ANR) metabolic pathway affected the astringency of pear fruit and that sorbitol transporter (SOT) transmembrane transport may be the main factor affecting the accumulation of soluble organic matter. This high-quality P. betuleafolia genome provides a valuable resource for the utilization of wild pear in fundamental pear studies and breeding. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


April 21, 2020  |  

Strengths and potential pitfalls of hay-transfer for ecological restoration revealed by RAD-seq analysis in floodplain Arabis species

Achieving high intraspecific genetic diversity is a critical goal in ecological restoration as it increases the adaptive potential and long-term resilience of populations. Thus, we investigated genetic diversity within and between pristine sites in a fossil floodplain and compared it to sites restored by hay-transfer between 1997 and 2014. RAD-seq genotyping revealed that the stenoecious flood-plain species Arabis nemorensis is co-occurring with individuals that, based on ploidy, ITS-sequencing and morphology, probably belong to the close relative Arabis sagittata, which has a documented preference for dry calcareous grasslands but has not been reported in floodplain meadows. We show that hay-transfer maintains genetic diversity for both species. Additionally, in A. sagittata, transfer from multiple genetically isolated pristine sites resulted in restored sites with increased diversity and admixed local genotypes. In A. nemorensis, transfer did not create novel admixture dynamics because genetic diversity between pristine sites was less differentiated. Thus, the effects of hay-transfer on genetic diversity also depend on the genetic makeup of the donor communities of each species, especially when local material is mixed. Our results demonstrate the efficiency of hay-transfer for habitat restoration and emphasize the importance of pre-restoration characterization of micro-geographic patterns of intraspecific diversity of the community to guarantee that restoration practices reach their goal, i.e. maximize the adaptive potential of the entire restored plant community. Overlooking these patterns may alter the balance between species in the community. Additionally, our comparison of summary statistics obtained from de novo and reference-based RAD-seq pipelines shows that the genomic impact of restoration can be reliably monitored in species lacking prior genomic knowledge.


April 21, 2020  |  

Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies

Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from textquoteleftfinishedtextquoteright. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies.Results We employed three gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: six with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and three with new assemblies based on re-scaffolding or Pacific Biosciences long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: seven for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further seven with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi.Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our comparisons show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.ADADSEQAGOAGOUTI-basedAGOUTIannotated genome optimization using transcriptome information toolALNalignment-basedCAMSAcomparative analysis and merging of scaffold assemblies toolDPdynamic programmingFISHfluorescence in situ hybridizationGAGOS-ASMGOS-ASMGene order scaffold assemblerKbpkilobasepairsMbpmegabasepairsOSORTHOSTITCHPacBioPacific BiosciencesPBPacBio-basedPHYphysical-mapping-basedRNAseqRNA sequencingQTLquantitative trait lociSYNsynteny-based.


April 21, 2020  |  

Genome-wide selection footprints and deleterious variations in young Asian allotetraploid rapeseed.

Brassica napus (AACC, 2n = 38) is an important oilseed crop grown worldwide. However, little is known about the population evolution of this species, the genomic difference between its major genetic groups, such as European and Asian rapeseed, and the impacts of historical large-scale introgression events on this young tetraploid. In this study, we reported the de novo assembly of the genome sequences of an Asian rapeseed (B. napus), Ningyou 7, and its four progenitors and compared these genomes with other available genomic data from diverse European and Asian cultivars. Our results showed that Asian rapeseed originally derived from European rapeseed but subsequently significantly diverged, with rapid genome differentiation after hybridization and intensive local selective breeding. The first historical introgression of B. rapa dramatically broadened the allelic pool but decreased the deleterious variations of Asian rapeseed. The second historical introgression of the double-low traits of European rapeseed (canola) has reshaped Asian rapeseed into two groups (double-low and double-high), accompanied by an increase in genetic load in the double-low group. This study demonstrates distinctive genomic footprints and deleterious SNP (single nucleotide polymorphism) variants for local adaptation by recent intra- and interspecies introgression events and provides novel insights for understanding the rapid genome evolution of a young allopolyploid crop. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


April 21, 2020  |  

Chromosomal-level assembly of the blolsod clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C.

The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae.A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated.We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

A chromosome-scale genome assembly of cucumber (Cucumis sativus L.).

Accurate and complete reference genome assemblies are fundamental for biological research. Cucumber is an important vegetable crop and model system for sex determination and vascular biology. Low-coverage Sanger sequences and high-coverage short Illumina sequences have been used to assemble draft cucumber genomes, but the incompleteness and low quality of these genomes limit their use in comparative genomics and genetic research. A high-quality and complete cucumber genome assembly is therefore essential.We assembled single-molecule real-time (SMRT) long reads to generate an improved cucumber reference genome. This version contains 174 contigs with a total length of 226.2 Mb and an N50 of 8.9 Mb, and provides 29.0 Mb more sequence data than previous versions. Using 10X Genomics and high-throughput chromosome conformation capture (Hi-C) data, 89 contigs (~211.0 Mb) were directly linked into 7 pseudo-chromosome sequences. The newly assembled regions show much higher guanine-cytosine or adenine-thymine content than found previously, which is likely to have been inaccessible to Illumina sequencing. The new assembly contains 1,374 full-length long terminal retrotransposons and 1,078 novel genes including 239 tandemly duplicated genes. For example, we found 4 tandemly duplicated tyrosylprotein sulfotransferases, in contrast to the single copy of the gene found previously and in most other plants.This high-quality genome presents novel features of the cucumber genome and will serve as a valuable resource for genetic research in cucumber and plant comparative genomics. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

The genome assembly and annotation of yellowhorn (Xanthoceras sorbifolium Bunge).

Yellowhorn (Xanthoceras sorbifolium Bunge), a deciduous shrub or small tree native to north China, is of great economic value. Seeds of yellowhorn are rich in oil containing unsaturated long-chain fatty acids that have been used for producing edible oil and nervonic acid capsules. However, the lack of a high-quality genome sequence hampers the understanding of its evolution and gene functions.In this study, a whole genome of yellowhorn was sequenced and assembled by integration of Illumina sequencing, Pacific Biosciences single-molecule real-time sequencing, 10X Genomics linked reads, Bionano optical maps, and Hi-C. The yellowhorn genome assembly was 439.97 Mb, which comprised 15 pseudo-chromosomes covering 95.42% (419.84 Mb) of the assembled genome. The repetitive fractions accounted for 56.39% of the yellowhorn genome. The genome contained 21,059 protein-coding genes. Of them, 18,503 (87.86%) genes were found to be functionally annotated with =1 “annotation” term by searching against other databases. Transcriptomic analysis showed that 341, 135, 125, 113, and 100 genes were specifically expressed in hermaphrodite flower, staminate flower, young fruit, leaf, and shoot, respectively. Phylogenetic analysis suggested that yellowhorn and Dimocarpus longan diverged from their most recent common ancestor ~46 million years ago.The availability and subsequent annotation of the yellowhorn genome, as well as the identification of tissue-specific functional genes, provides a valuable reference for plant comparative genomics, evolutionary studies, and molecular design breeding. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

A critical comparison of technologies for a plant genome sequencing project.

A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates.Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs.The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data.

Construction of chromosome-level assembly is a vital step in achieving the goal of a ‘Platinum’ genome, but it remains a major challenge to assemble and anchor sequences to chromosomes in autopolyploid or highly heterozygous genomes. High-throughput chromosome conformation capture (Hi-C) technology serves as a robust tool to dramatically advance chromosome scaffolding; however, existing approaches are mostly designed for diploid genomes and often with the aim of reconstructing a haploid representation, thereby having limited power to reconstruct chromosomes for autopolyploid genomes. We developed a novel algorithm (ALLHiC) that is capable of building allele-aware, chromosomal-scale assembly for autopolyploid genomes using Hi-C paired-end reads with innovative ‘prune’ and ‘optimize’ steps. Application on simulated data showed that ALLHiC can phase allelic contigs and substantially improve ordering and orientation when compared to other mainstream Hi-C assemblers. We applied ALLHiC on an autotetraploid and an autooctoploid sugar-cane genome and successfully constructed the phased chromosomal-level assemblies, revealing allelic variations present in these two genomes. The ALLHiC pipeline enables de novo chromosome-level assembly of autopolyploid genomes, separating each allele. Haplotype chromosome-level assembly of allopolyploid and heterozygous diploid genomes can be achieved using ALLHiC, overcoming obstacles in assembling complex genomes.


April 21, 2020  |  

Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes.

The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.Copyright © 2019 Elsevier Ltd. All rights reserved.


April 21, 2020  |  

Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense.

Allotetraploid cotton species (Gossypium hirsutum and Gossypium barbadense) have long been cultivated worldwide for natural renewable textile fibers. The draft genome sequences of both species are available but they are highly fragmented and incomplete1-4. Here we report reference-grade genome assemblies and annotations for G. hirsutum accession Texas Marker-1 (TM-1) and G. barbadense accession 3-79 by integrating single-molecule real-time sequencing, BioNano optical mapping and high-throughput chromosome conformation capture techniques. Compared with previous assembled draft genomes1,3, these genome sequences show considerable improvements in contiguity and completeness for regions with high content of repeats such as centromeres. Comparative genomics analyses identify extensive structural variations that probably occurred after polyploidization, highlighted by large paracentric/pericentric inversions in 14 chromosomes. We constructed an introgression line population to introduce favorable chromosome segments from G. barbadense to G. hirsutum, allowing us to identify 13 quantitative trait loci associated with superior fiber quality. These resources will accelerate evolutionary and functional genomic studies in cotton and inform future breeding programs for fiber improvement.


April 21, 2020  |  

Computational aspects underlying genome to phenome analysis in plants.

Recent advances in genomics technologies have greatly accelerated the progress in both fundamental plant science and applied breeding research. Concurrently, high-throughput plant phenotyping is becoming widely adopted in the plant community, promising to alleviate the phenotypic bottleneck. While these technological breakthroughs are significantly accelerating quantitative trait locus (QTL) and causal gene identification, challenges to enable even more sophisticated analyses remain. In particular, care needs to be taken to standardize, describe and conduct experiments robustly while relying on plant physiology expertise. In this article, we review the state of the art regarding genome assembly and the future potential of pangenomics in plant research. We also describe the necessity of standardizing and describing phenotypic studies using the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard to enable the reuse and integration of phenotypic data. In addition, we show how deep phenotypic data might yield novel trait-trait correlations and review how to link phenotypic data to genomic data. Finally, we provide perspectives on the golden future of machine learning and their potential in linking phenotypes to genomic features. © 2018 The Authors The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.


April 21, 2020  |  

A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci.

Cannabis sativa is widely cultivated for medicinal, food, industrial, and recreational use, but much remains unknown regarding its genetics, including the molecular determinants of cannabinoid content. Here, we describe a combined physical and genetic map derived from a cross between the drug-type strain Purple Kush and the hemp variety “Finola.” The map reveals that cannabinoid biosynthesis genes are generally unlinked but that aromatic prenyltransferase (AP), which produces the substrate for THCA and CBDA synthases (THCAS and CBDAS), is tightly linked to a known marker for total cannabinoid content. We further identify the gene encoding CBCA synthase (CBCAS) and characterize its catalytic activity, providing insight into how cannabinoid diversity arises in cannabis. THCAS and CBDAS (which determine the drug vs. hemp chemotype) are contained within large (>250 kb) retrotransposon-rich regions that are highly nonhomologous between drug- and hemp-type alleles and are furthermore embedded within ~40 Mb of minimally recombining repetitive DNA. The chromosome structures are similar to those in grains such as wheat, with recombination focused in gene-rich, repeat-depleted regions near chromosome ends. The physical and genetic map should facilitate further dissection of genetic and molecular mechanisms in this commercially and medically important plant. © 2019 Laverty et al.; Published by Cold Spring Harbor Laboratory Press.


April 21, 2020  |  

Improvement of the Pacific bluefin tuna (Thunnus orientalis) reference genome and development of male-specific DNA markers.

The Pacific bluefin tuna, Thunnus orientalis, is a highly migratory species that is widely distributed in the North Pacific Ocean. Like other marine species, T. orientalis has no external sexual dimorphism; thus, identifying sex-specific variants from whole genome sequence data is a useful approach to develop an effective sex identification method. Here, we report an improved draft genome of T. orientalis and male-specific DNA markers. Combining PacBio long reads and Illumina short reads sufficiently improved genome assembly, with a 38-fold increase in scaffold contiguity (to 444 scaffolds) compared to the first published draft genome. Through analysing re-sequence data of 15 males and 16 females, 250 male-specific SNPs were identified from more than 30 million polymorphisms. All male-specific variants were male-heterozygous, suggesting that T. orientalis has a male heterogametic sex-determination system. The largest linkage disequilibrium block (3,174?bp on scaffold_064) contained 51 male-specific variants. PCR primers and a PCR-based sex identification assay were developed using these male-specific variants. The sex of 115 individuals (56 males and 59 females; sex was diagnosed by visual examination of the gonads) was identified with high accuracy using the assay. This easy, accurate, and practical technique facilitates the control of sex ratios in tuna farms. Furthermore, this method could be used to estimate the sex ratio and/or the sex-specific growth rate of natural populations.


April 21, 2020  |  

A high-quality genome of Eragrostis curvula grass provides insights into Poaceae evolution and supports new strategies to enhance forage quality.

The Poaceae constitute a taxon of flowering plants (grasses) that cover almost all Earth’s inhabitable range and comprises some of the genera most commonly used for human and animal nutrition. Many of these crops have been sequenced, like rice, Brachypodium, maize and, more recently, wheat. Some important members are still considered orphan crops, lacking a sequenced genome, but having important traits that make them attractive for sequencing. Among these traits is apomixis, clonal reproduction by seeds, present in some members of the Poaceae like Eragrostis curvula. A de novo, high-quality genome assembly and annotation for E. curvula have been obtained by sequencing 602?Mb of a diploid genotype using a strategy that combined long-read length sequencing with chromosome conformation capture. The scaffold N50 for this assembly was 43.41?Mb and the annotation yielded 56,469 genes. The availability of this genome assembly has allowed us to identify regions associated with forage quality and to develop strategies to sequence and assemble the complex tetraploid genotypes which harbor the apomixis control region(s). Understanding and subsequently manipulating the genetic drivers underlying apomixis could revolutionize agriculture.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.