In this presentation at PAG 2020, Bart Nijland of Genetwister Technologies explains how his team set out to make a haplotype-aware assembly of the highly complex tetraploid Rosa x hybrida L. genome in order to capture its full range of genetic variation. HiFi reads generated from PacBio’s Sequel II System have made it possible to parse out critical information from many of the plant’s parental genes.
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of…
Giant groupers, the largest grouper type in the world, are of economic importance in marine aquaculture for their rapid growth. At the same time, bacterial and viral diseases have become the main threats to the grouper industry. Here, we report a high-quality genome of a giant grouper sequenced by an Illumina HiSeq X-Ten and PacBio Bioscience Sequel platform. A total of 254 putative antimicrobial peptide (AMP) genes were identified, which can be divided into 34 classes according to the annotation of the Antimicrobial Peptides Database (APD3). Their locations in pseudochromosomes were also determined. Thrombin-, lectin-, and scolopendin-derived putative AMPs were…
Scutellaria baicalensis Georgi is important in Chinese traditional medicine where preparations of dried roots, “Huang Qin,” are used for liver and lung complaints and as complementary cancer treatments. We report a high-quality reference genome sequence for S. baicalensis where 93% of the 408.14-Mb genome has been assembled into nine pseudochromosomes with a super-N50 of 33.2 Mb. Comparison of this sequence with those of closely related species in the order Lamiales, Sesamum indicum and Salvia splendens, revealed that a specialized metabolic pathway for the synthesis of 4′-deoxyflavone bioactives evolved in the genus Scutellaria. We found that the gene encoding a specific…
Allotetraploid cotton is an economically important natural-fiber-producing crop worldwide. After polyploidization, Gossypium hirsutum L. evolved to produce a higher fiber yield and to better survive harsh environments than Gossypium barbadense, which produces superior-quality fibers. The global genetic and molecular bases for these interspecies divergences were unknown. Here we report high-quality de novo-assembled genomes for these two cultivated allotetraploid species with pronounced improvement in repetitive-DNA-enriched centromeric regions. Whole-genome comparative analyses revealed that species-specific alterations in gene expression, structural variations and expanded gene families were responsible for speciation and the evolutionary history of these species. These findings help to elucidate the evolution…
Cultivated strawberry emerged from the hybridization of two wild octoploid species, both descendants from the merger of four diploid progenitor species into a single nucleus more than 1 million years ago. Here we report a near-complete chromosome-scale assembly for cultivated octoploid strawberry (Fragaria?×?ananassa) and uncovered the origin and evolutionary processes that shaped this complex allopolyploid. We identified the extant relatives of each diploid progenitor species and provide support for the North American origin of octoploid strawberry. We examined the dynamics among the four subgenomes in octoploid strawberry and uncovered the presence of a single dominant subgenome with significantly greater gene…
Jatropha curcas (physic nut), a non-edible oilseed crop, represents one of the most promising alternative energy sources due to its high seed oil content, rapid growth and adaptability to various environments. We report ~339 Mbp draft whole genome sequence of J. curcas var. Chai Nat using both the PacBio and Illumina sequencing platforms. We identified and categorized differentially expressed genes related to biosynthesis of lipid and toxic compound among four stages of seed development. Triacylglycerol (TAG), the major component of seed storage oil, is mainly synthesized by phospholipid:diacylglycerol acyltransferase in Jatropha, and continuous high expression of homologs of oleosin over seed development…
Carthamus tinctorius L, also known as safflower, is an important oil crop planted worldwide. The com- plete chloroplast (cp) genome was reported in this study using the PacBio Sequel Platform. The cp genome with a total size of 152,963bp consisted of two inverted repeats (25,128bp) separated by a large single-copy region (84,124bp) and a small single-copy region (18,583bp). Further annotation revealed the cp genome contains 112 genes, including 79 protein-coding genes, 29 tRNA genes, and 4 rRNA genes. The information of the cp genome will be useful for investigation of evolution and molecular breeding of safflower in the future.
The Populus shoot undergoes primary growth (longitudinal growth) followed by secondary growth (radial growth), which produces biomass that is an important source of energy worldwide. We adopted joint PacBio Iso-Seq and RNA-seq analysis to identify differentially expressed transcripts along a developmental gradient from the shoot apex to the fifth internode of Populus Nanlin895. We obtained 87 150 full-length transcripts, including 2081 new isoforms and 62 058 new alternatively spliced isoforms, most of which were produced by intron retention, that were used to update the Populus annotation. Among these novel isoforms, there are 1187 long non-coding RNAs and 356 fusion genes.…
Haematococcus pluvialis is a freshwater species of Chlorophyta, family Haematococcaceae. It is well known for its capacity to synthesize high amounts of astaxanthin, which is a strong antioxidant that has been utilized in aquaculture and cosmetics. To improve astaxanthin yield and to establish genetic resources for H. pluvialis, we performed whole-genome sequencing, assembly, and annotation of this green microalga. A total of 83.1 Gb of raw reads were sequenced. After filtering the raw reads, we subsequently generated a draft assembly with a genome size of 669.0?Mb, a scaffold N50 of 288.6?kb, and predicted 18,545 genes. We also established a robust…
Many of our major crop species are polyploids, containing more than one genome or set of chromosomes. Polyploid crops present unique challenges, including difficulties in genome assembly, in discriminating between multiple gene and sequence copies, and in genetic mapping, hindering use of genomic data for genetics and breeding. Polyploid genomes may also be more prone to containing structural variation, such as loss of gene copies or sequences (presence–absence variation) and the presence of genes or sequences in multiple copies (copy-number variation). Although the two main types of genomic structural variation commonly identified are presence–absence variation and copy-number variation, we propose…
Flavonoids, theanine and caffeine are the main secondary metabolites of the tea plant (Camellia sinensis), which account for the tea’s unique flavor quality and health benefits. The biosynthesis pathways of these metabolites have been extensively studied at the transcriptional level, but the regulatory mechanisms are still unclear. In this study, to explore the transcriptome diversity and complexity of tea plant, PacBio Iso-Seq and RNA-seq analysis were combined to obtain full-length transcripts and to profile the changes in gene expression during the leaf development. A total of 1,388,066 reads of insert (ROI) were generated with an average length of 1,762?bp, and…
Gypenosides are a group of triterpene saponins from Gynostemma pentaphyllum that are the same as or very similar to ginsenosides from the Panax species. Several enzymes involved in ginsenoside biosynthesis have been characterized, which provide important clues for elucidating the gypenoside biosynthetic pathway. We suppose that gypenosides and ginsenosides may have a similar biosynthetic mechanism and that the corresponding enzymes in the two pathways may have considerable similarity in their sequences. To further understand gypenoside biosynthesis, we sequenced the G. pentaphyllum transcriptome with a hybrid sequencing-based strategy and then determined the candidate genes involved in this pathway using phylogenetic tree…
Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and…
Taxus cuspidata is well known worldwide for its ability to produce Taxol, one of the top-selling natural anticancer drugs. However, current Taxol production cannot match the increasing needs of the market, and novel strategies should be considered to increase the supply of Taxol. Since the biosynthetic mechanism of Taxol remains largely unknown, elucidating this pathway in detail will be very helpful in exploring alternative methods for Taxol production.Here, we sequenced Taxus cuspidata transcriptomes with next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms. After correction with Illumina reads and removal of redundant reads, more than 180,000 nonredundant transcripts were generated from…