When Doreen Ware and her team’s latest genome project is complete, the plant science community will have a critical new tool that once seemed virtually impossible: a robust reference assembly for the maize genome. This resource will support breeding of hardier, higher-yielding lines of maize, the number one crop plant in both the U.S. and China.
Ware, a computational biologist with the USDA at Cold Spring Harbor Laboratory, says climate change and protecting the environment are major challenges facing agriculture. “We know that we must increase yield in order to meet the expected 9 billion people in less than 25 years,” she says. “Having the genome and the genome content allows us to accelerate the improvement of maize varieties and the germplasm.”
Maize, with about 2.3 Gb of sequence, seems almost designed to evade genomic characterization. Previous sequencing efforts with Sanger and other technologies were stymied by the plant’s complex universe of transposable elements and highly repetitive regions. The genome differs significantly between plants: the gene complement of any two maize plants, even of the exact same variety, can differ as much as five percent. For population-level crop improvement, Ware says, this field needs not one reference assembly but many.
Her team is well on its way to delivering the first truly high-quality reference assembly for maize, with plans in the works to produce several more in the near future. This impressive achievement was based on pairing two technologies: Single Molecule, Real-Time (SMRT) Sequencing from PacBio, and Next-Generation Mapping from BioNano Genomics’ Irys System. Together, these approaches have enabled Ware and her team to produce an assembly with greater contiguity and accuracy than has ever been possible for this challenging genome, providing the first-ever look at important regulatory and structural elements that could influence breeding approaches.
The team began with PacBio sequencing, generating a de novo assembly of stunning quality. The previous version of the publicly available assembly for maize (B73 RefGen_v3), which was based on Sanger-sequenced BACs and 454 sequencing data, was broken up into about 140,000 contigs; the PacBio assembly has just 3,300 contigs. Contig N50 lengths rose from about 19 kb in the previous assembly to more than 1 Mb in the PacBio assembly.
“The contigs that we’re looking at now have almost no unknown bases in them,” Ware says. In every chromosome, existing sequence length was extended, filling in gaps that had previously peppered the assembly. The new assembly includes most of the centromeric sequence across all 10 chromosomes and even represents portions of the telomeres.
The next step was layering BioNano Mapping into the new assembly. Again, the results represented a significant improvement made possible only by pairing the technologies: The total contig count was reduced to just 768, while the contig N50 length increased to more than 9.5 Mb.
“What the PacBio assembly and the BioNano map allowed us to do is to achieve improved contigs and scaffolding,” Ware says. She says that each tool offers value — nucleotide-level resolution and long reads with PacBio sequencing, and massive scaffolds with BioNano Mapping — but that together they’re even more effective. “The combination gives you a better product than either technology does alone,” she adds.
Being able to view repeat content gives the team its first clear look at gene regulation. “Which transposable element is near a gene may determine the methylation status surrounding that gene, and the methylation status can directly impact expression,” Ware notes. “With the complete reference, we can start to understand how these genes are regulated.”
Now, the team is plowing ahead with plans to sequence three or four additional maize genomes in the coming year. “We don’t want just one reference, we want hundreds of references,” Ware says. “Having more genomes will let us see the full complement of genes within the species, how much of the regulatory regions are conserved, and how diverse they are.”