Genome-wide association studies (GWAS) may be powerful tools for the identification of genes underlying complex traits, but what if you have an incredibly complex, uncharacterized genome, with no sequenced progenitor or related species?
A team of scientists from the Chinese Academy of Agricultural Sciences in Changsha, China came up with a solution: a transcriptome-referenced association study (TRAS), powered by our Iso-Seq method.
The approach, outlined in this DNA Research paper, utilized a transcriptome generated by SMRT Sequencing as a reference to score population variation at both transcript sequence and expression levels. The team, led by Touming Liu and first author Xiaojun Chen, used the approach to study the shape of garlic cloves.
Cultivated globally for more than 5,000 years as a vegetable, spice, and medicinal plant, garlic (Allium sativum L.) is a diploid species with a giant genome: ~15.9 Gb, 32 times larger than rice. The most widely consumed part of the plant, the bulb, consists of several cloves that are actually abnormal axillary buds rarely found among vascular plants. The shape of these cloves are economically important quantitative traits, but their genetic mechanisms are poorly understood.
Plant quantitative traits are typically controlled by several major and minor effect genes that constitute complex regulatory networks, and characterization of these traits is time-consuming and labor-intensive when using traditional mapping methods that involve the identification and cloning of dozens of trait-control genes. Previous studies that conducted de novo assembly of the garlic transcriptome were able to produce more than 120,000 transcripts, but many were considered incomplete, with an average length of less than 600 bp, of which only 35–42% were functionally annotated.
So, Liu and colleagues collected bulb samples from 92 landraces in China and 10 from other countries, and selected one candidate from China for Iso-Seq long-read RNA transcript sequencing. From this, they created a high-quality reference transcriptome that consisted of 36,321 transcripts of lengths ranging from 120 to 4,803 bp, accounting for 54.48 million bases in total.
The Iso-Seq method “significantly improved the transcriptome quality—the mean length of the transcripts was 1,500 bp; more than 70% of the transcripts had a complete 3′ end; and only less than 1% of the transcripts remained functionally unannotated,” the authors wrote.
To characterize the genotypes of the rest of the 102 landraces in both sequence and expression, they sequenced the transcriptomes of developing bulbs in the population. The read sequences were aligned to the reference transcriptome, and the variation in both sequence (SNPs) and GE of transcripts were scored.
The team ultimately identified 22 candidate transcripts, most of which showed extensive interactions. Eight transcripts were long non-coding RNAs (lncRNAs), and the others encoded proteins involved mainly in carbohydrate metabolism and protein degradation. These findings can provide a basis for improving clove shape traits in garlic breeding, as well as validate the TRAS approach, the authors said.
“Our results demonstrate that TRAS is a useful approach for association studies, and its independence from a reference genome will extend the applicability of association studies to a broad range of species,” they wrote. TRAS also offered additional advantages in comparison with the GWAS approach, the team noted.
It can directly detect candidate transcripts for a trait by integrating sequence data with expression data, in contrast to GWAS, which identifies only a genome region in which markers are in linkage disequilibrium for the loci controlling the trait. Also, unlike GWAS, after identifying a genome region based on the sequence variation, TRAS uses the information on transcript expression in the identified region to determine whether or not the corresponding transcript is associated with a given trait. And TRAS can detect potential interaction of transcripts by eQTL analysis, and the potential relationship among the transcripts is helpful for further validation of these interactions.