Maize Collaborators Embark on Ambitious 26-line Pangenome Project
Wednesday, July 11, 2018
The first reference genome for maize variety B73, completed in 2009, was a major milestone, and an improved version released by Cold Spring Harbor Laboratory scientists in 2017 provided a deeper dive into the genetics of the complex crop. Yet even this new robust reference is not enough for Kelly Dawe, Doreen Ware and Matt Hufford, who have taken up another ambitious project: creating a 26-line pangenome reference collection in just two years.
“Maize is not only an important crop, but an important study species for answering basic questions about how plants grow and adapt to different environments,” says Ware, a computational biologist at USDA and Cold Spring Harbor Laboratory.
Interestingly, the genome differs significantly between individuals. A study comparing genome segments associated with kernel color from two inbred lines revealed that 12 percent of the gene content was not shared – that’s much more diversity within the species than between humans and chimpanzees, which exhibit more than 98 percent sequence similarity. The new project will create multiple reference genomes to reflect this diversity.
“By relying on a single type specimen as the sequence reference for most of the genetic information in maize, we may be missing much of the highly valuable natural variation in maize,” Ware says.
Beyond B73, the most extensively researched maize lines are the core set of 25 inbreds known as the NAM founder lines, which represent a broad cross section of modern maize diversity. SMRT Sequencing and BioNano optical mapping, which were essential in the creation of the groundbreaking 2017 B73 maize reference, will be used in the new $2.8 million National Science Foundation-funded project led by Dawe at the University of Georgia. They will create comprehensive, high-quality assemblies of these 25 inbreds, plus an additional line containing abnormal chromosome 10.
Plant genomes are notoriously difficult to sequence, and maize is particularly challenging because the vast majority of its 2.3 Gb diploid genome — a staggering 85 percent — is made up of highly repetitive transposable elements that other types of sequencing can’t address. Understanding these regulatory and structural elements is crucial to modern breeding efforts that aim to improve productivity across marginal environments and under changing climate.
“The sequenced lines will include varieties from both tropical and temperate regions, and their sequences should help us understand how corn has adapted to these different environments,” said Hufford, a co-principal investigator on the project and assistant professor at Iowa State University. “Understanding the ways corn adapts can facilitate development of lines for novel conditions.”
PacBio Sequencing will be essential as the team assesses the role of structural variation such as presence-absence and copy number variation in the determination of agronomic traits, Ware says.
The assemblies, along with information about the genes and their expression patterns, will be cataloged and made available to the public through her Gramene.org data resource.
“To go from a single reference to a broad perspective on the entire genetic repertoire of genes and gene expression patterns will be a major step forward in how we approach genome analysis in crops,” said Dawe, Distinguished Research Professor in UGA’s Franklin College of Arts and Sciences department of genetics and principal investigator on the grant. “It’s something that has not happened for any crop at this scale.”
Read about Doreen Ware’s original comprehensive maize genome project and about efforts at Corteva Agriscience™, Agriculture Division of DowDupont™ (formerly DuPont Pioneer) to create their own multiple maize reference library.