Setting a New Gold Standard in Strain Sequencing: Webinar Describes Cannabis Pan-Genome Project
Monday, July 1, 2019
What makes one strain of cannabis have potent psychoactive properties, and another more suitable for medicinal purposes? Scientists are several steps closer to figuring it all out, thanks to PacBio long-read sequencing and transcriptome analysis of the Cannabis sativa plant.
In a recent webinar, Kevin McKernan (@Kevin_McKernan) of Medicinal Genomics (MGC), described how his company’s efforts to create a Cannabis Pan-Genome have already netted interesting results.
Using MGC’s assembly of the female Jamaican Lion cultivar as a baseline, genomic DNA from a sibling male plant and multiple offspring were isolated and sequenced with the Sequel II System to identify structural and other types of important genetic variations.
This “family” sequencing strategy yields a recombination map and is the basis for creating a pan-genome of cannabis. It has helped the team identify the genetic variations that cause a plant to produce the important cannabinoids of THC (tetrahydrocannabinol, which can cause intoxication), CBD (the active ingredient in medicinal cannabis), or a mixture of the two, referred to as chemotypes (I-IV) — key to breeding for cannabis yield, potency, and a host of other traits.
McKernan explained how PacBio sequencing is what made the project possible. He used a combination of short- and long-read sequencing technologies in the past, but found the short reads could not capture structural variation sufficiently, especially in a genome that is many times more complex than the human genome — 25-35% of the genome was not mappable with short-reads, he said.
Relying on SNP chips is also not ideal, as they require primers that assume a certain reference sequence.
“When there are variants under these primers, of which there are lots in cannabis, the assays fail to produce clean signal,” McKernan said. “A high number of SNP chip assays will fail as they were designed before this was known.”
Using SMRT Sequencing, McKernan’s team found more than 116 Mb of structural variation in their trio “family” sequencing, accounting for 1/8th of the genome, including more than 1 million small (less than 50 base pairs) insertions and deletions. The human genome, by comparison, contains about 7 Mb in structural variation, or less than 1% of the genome.
“Genotyping needs to dance around all this variation,” McKernan said. “Now we have it all beautifully resolved with PacBio.”
They also conducted single-molecule mRNA isoform sequencing (Iso-Seq analysis), on five different parts of the plant. This allowed them to characterize between 45-85K genes expressed in cannabis, and to create maps of methylation, splicing sites, and recombination hotspots, in collaboration with Phase Genomics and New England Biolabs.
Because of this, they now have a better picture of the Y chromosome, which could shed insight into hermaphroditism within the species.
“Life always finds a way. When cannabis is stressed, it switches sex and self pollinates,” McKernan said. “We recently found an S1 gene that might be playing a role in the process.”
Among the surprises they encountered was the level of chloroplast heteroplasmy in cannabis. Chloroplast genomes are the most popular targets for improving yield, so they are important to understand, McKernan noted.
“We identified eight different haplotypes. We are still sorting through what this means,” he said.
They also learned that cannabinoid synthases had introns and identified previously unknown genes involved in pathogen defense. The discovery of other novel genes “might allow us to resurrect other synthases that have been bred out of existence,” McKernan said.
In addition the insights MGC has gleaned, the data they accumulated and posted publicly has already been incorporated into research by other teams. Conor Jenkins and Ben Orsburn of Think20 Labs in Maryland, for example, recently constructed a draft map of the cannabis proteome.
Other applications enabled by the enhanced reference genome and transcriptome data include an mRNA LAMP assay to distinguish between hemp and cannabis. By law, hemp is defined as having less than 0.3% THC. By testing active CBDA levels, MGC will be able to test whether a strain is hemp, as CBDA expression and THCA expression are inversely correlated.
Watch the complete webinar to learn more: