A recent research partnership with KeyGene, a Dutch plant genomics and crop improvement company, has resulted in an integrated whole-genome assembly and transcriptome of Gossypium hirsutum, or tetraploid cotton. This is the first known complete assembly for a polyploid crop with a genome larger than 2 Gb.
KeyGene has a long established reputation for generating high-quality data even for very complex genomes. For this project, the cotton genome was sequenced with 38x coverage using Single-Molecule, Real-Time (SMRT®) Sequencing. Assembly of PacBio® long reads reduced the number of contigs from more than 1 million in an existing short-read assembly to fewer than 22,000, representing a 47-fold increase in contiguity.
KeyGene also studied gene expression in cotton, using Pacific Biosciences’ Iso-Seq™ method to generate full-length transcript reads. These were then used for evidence-based annotation of the new reference. They analyzed expression patterns in leaf, stem, and root tissues and discovered novel tissue- and haplotype-specific splice variants. The biological significance of these variants is undergoing investigation at KeyGene.
To improve the reference even further, KeyGene incorporated its proprietary Whole Genome Profiling (WGP™) technology, building a physical map based on partially sequenced BAC fragments. This step further reduced the number of contigs in the final cotton assembly, resulting in the most comprehensive tetrapolid cotton reference to date.
KeyGene has incorporated all of the resulting data into its proprietary crop-specific genome database, which will be available to their commercial partners around the world engaged in breeding and genetic improvement of cotton.
For more about KeyGene’s scientific work, check out our case study featuring Michiel van Eijk, the company’s Chief Scientific Officer.