Project to Rapidly Sequence Maize Pangenome Delivers Publicly Available Resource
Thursday, January 23, 2020
Maize researchers have been rejoicing over a New Year’s gift delivered by a group of 33 scientists: A 26-line “pangenome” reference collection.
The multi-institutional consortium of researchers used the Sequel System and BioNano Genomics optical mapping to create the assemblies and high-confidence annotations. They released the results on January 9, and in several presentations at the Plant and Animal Genome XXVIII Conference, less than two years after the ambitious project was funded by a $2.8 million National Science Foundation grant.
The collection includes comprehensive, high-quality assemblies of 26 inbreds known as the NAM founder lines — the most extensively researched maize lines that represent a broad cross section of modern maize diversity — as well as an additional line containing abnormal chromosome 10.
Scientists can download the project’s raw whole genome sequencing data, RNA sequencing data, optical map data, gene annotations and gene models at MaizeGDB. The site also features browsing and data visualization tools.
Led by faculty investigator R. Kelly Dawe (@corncolors), Distinguished Research Professor at the University of Georgia, Matt Hufford (@mbhufford), associate professor at Iowa State University, and Doreen Ware, a computational biologist at USDA and Cold Spring Harbor Laboratory, the NAM Consortium also included scientists from Corteva Agriscience, who are conducting their own large-scale sequencing effort of the company’s maize lines as well.
“People have been using these particular lines for years, so everybody has been really excited to get these new references as a resource,” Hufford said. “The assemblies that have come out are better than anything else that’s out in maize.”
Maize has been extremely challenging to sequence because the vast majority of its 2.3 Gb genome — a staggering 85 percent — is made up of highly repetitive transposable elements. It is also amazingly diverse. A study comparing genome segments associated with kernel color from two inbred lines revealed that 12 percent of the gene content was not shared – that’s much more diversity within the species than between humans and chimpanzees, which exhibit more than 98 percent sequence similarity.
The 26 varieties were prepped at the Arizona Genomics Institute, sequenced at the University of Georgia, Oregon State University, and Brigham Young University, and assembled by the NAM Consortium using PacBio long reads. Scaffolds were validated by BioNano optical mapping, and ordered and oriented using linkage and pan-genome marker data. RNA-seq data from multiple tissues were used to annotate each genome using a pipeline that included BRAKER, Mikado and PASA.
“We spent a lot of time on gene model annotation, validation and benchmarking against B73 (the first reference genome annotations for maize, created by Ware’s lab in 2009, and updated in 2017) and other maize genes that have been manually curated by the community,” Hufford said.
Now comes the fun part: Peering into all the data and seeing what secrets it will reveal.
“For the last few months, we have started to see the cool biology emerging,” Hufford said. “What we are seeing is a lot of structural variation linked to phenotypic traits we haven’t been able to explain before.”
In addition to answering questions about basic biology and agronomic variation, the data is shedding light on the evolution of the different maize lines.
“We’re learning about the tempo of gene loss following a genome doubling event several million years ago. It appears to be ongoing, and still in flux,” Hufford said.
Next steps for the consortium include additional functional annotations for the NAM gene models, such as transposable elements, SNPs and insertions, as well as methylome and ATAC-Seq data.
“These data will help the maize community assess the role of variation in the determination of agronomic traits,” Hufford said.
Hufford will also be using SMRT Sequencing on the Sequel II System for two other large assembly projects for teosintes, a wild relative of maize, and other grass species.
“I think it’s really going to help with some of these complex varieties,” he said.
Learn more about the methods and workflow for PacBio whole genome sequencing.