When A Single Reference is Not Enough
Monday, January 15, 2018
Maize is amazingly diverse. A study comparing genome segments from two inbred lines, for instance, revealed that half of the sequence and one-third of the gene content was not shared – that’s more diversity within the species than between some other species, for example humans and chimpanzees, which exhibit more than 98 percent sequence similarity.
So how can researchers and commercial breeders rely upon a single reference genome to represent the genetic diversity in their germplasms? More and more scientists are deciding they cannot.
At DuPont Pioneer, where DNA sequencing is paramount for R&D to reveal the genetic basis for traits of interest in a variety of commercial crops, an ambitious project has begun: A pan-genome reference collection based on high-speed, high throughput SMRT Sequencing and assembly.
As described in this case study, the company has developed a way to assemble high-quality reference genomes in just one month, and it has started to create them for several of their own elite breeding lines, as well as select wild strains.
Having multiple genome assemblies of the same high standard for several genotypes will be increasingly important as researchers try to achieve a greater understanding of the impacts of structural variation on plant genomes, says Research Scientist Kevin Fengler, of the Data Science and Informatics group at DuPont Pioneer.
“We want to focus on true structural variation and have confidence in the new discoveries we find in these genomes,” he adds. “Until now, focusing on one reference genome has limited our view. We are just beginning to explore what we have been missing all along.”
From commercialization to crop improvement to answering basic questions about biology, generating and analyzing multiple reference genomes has myriad benefits in a variety of lab settings.
“It has become clear that one single genome is not enough to represent the huge amount of variation in rice genomes,” writes Zhi-Kang Li of the Chinese Academy of Agricultural Sciences in this recent Nature Scientific Data paper about the assembly of an early-mature japonica rice genome.
Rod Wing of the Arizona Genomics Institute agrees. He is aiming to build high-quality reference genomes for 23 additional species of rice using SMRT Sequencing. Beyond providing highly accurate, long-read sequence data, Wing said the PacBio platform is also useful for full-length RNA sequencing and its ability to characterize the methylome. As he notes in this blog post and case study, he can take rice tissues at several developmental stages and under many different environmental conditions, isolate RNA, and do Iso-Seq analysis on those samples to enable whole plant transcriptome analysis, which could help the community map gene networks.
Other researchers are also eager to expand their set of references:
- At CROPS 2017, a three-day event focused on genomic technologies and their use in crop improvement and breeding programs, Jeremy Schmutz of the HudsonAlpha Institute for Biotechnology and the Joint Genome Institute, discussed his use of PacBio SMRT Sequencing to create several cotton genomes as well as Brachypodium, peanut, sorghum, and more. As he notes in this blog post, SMRT Sequencing has been successful even for very challenging plant genomes with highly repetitive elements, GC-rich regions, areas of high and low complexity, and of varying degrees of ploidy.
- Grapes are getting a thorough cataloging, thanks to the efforts of the Cantu Lab at the University of California, Davis. His colleague, Steve Knapp, is also leading efforts to expand the selection of reference genome assemblies for strawberries.
- There is also a pressing need to add to the germplasm of the world’s most valuable plant: Coffee. As noted in this case study, there are about 100 species in the Coffea genus, but the particular strains cultivated to produce coffee — a market valued at $90 billion — have very little genetic diversity. In her quest to address disease and climate pressures that threaten the plant, Cornell University researcher Marcela Yepes has begun to create new references for several varieties, starting with Coffea Arabica and Coffea eugenioides.
We are looking forward to hearing about additional efforts to expand the reference genome library at the ongoing Plant and Animal Genome XXVI Conference. Look out for PacBio @ PAG, and swing by booth 418 to say hi. We will also be hosting a half-day informatics conference on Wednesday, Jan. 17.