Scientists Publish High-Quality, Near-Complete Genome of Resurrection Grass Oropetium
Wednesday, November 11, 2015
We’re excited about a new Nature paper from the winners of our 2014 “Most Interesting Genome in the World” SMRT Grant program. “Single-molecule sequencing of the desiccation tolerant grass Oropetium thomaeum” comes from lead authors Robert VanBuren and Doug Bryant along with senior author Todd Mockler at the Donald Danforth Plant Science Center, as well as a number of collaborators at other institutions. In it, the authors report a virtually complete genome of Oropetium thomaeum, a grass with an estimated genome size of 245 Mb and the handy ability to regrow even after extreme drought once water becomes available.
The scientists believe that a better understanding of the plant’s genome could shed light on the mechanisms underpinning these so-called resurrection plants, and ultimately enable the engineering of crop plants to withstand severe drought and stress.
For this study, the team worked with about 72x coverage of the Oropetium genome generated by the PacBio system. That’s “equivalent to <1 week of sequencing time and <$10k in reagents,” according to the paper. Based on HGAP and Quiver, the resulting assembly covered 99% of the genome in 625 contigs, with an accuracy of 99.99995% and a contig N50 length of 2.4 Mb.
VanBuren et al. note that the contiguity of the assembly sets it apart from draft genomes produced from short-read sequencers. “Most NGS-based genomes have on the order of tens of thousands of short contigs distributed in thousands of scaffolds,” the scientists write. Because the assemblies are so fragmented, “they are missing biologically meaningful sequences including entire genes, regulatory regions, transposable elements (TEs), centromeres, telomeres and haplotype-specific structural variations.”
Instead, SMRT Sequencing is pushing new limits to characterize those elements in the Oropetium genome, with its predicted 28,446 protein-coding genes and a significant proportion of repeat regions. The authors noted that “the largest tandem array contains five identical and one partial 9 kb repeats collectively spanning 51 kb; this is approaching the theoretical limit given the current read-length distributions of PacBio.” The assembly includes telomere and centromere sequence, long terminal-repeat retrotransposons, tandem duplicated genes, and other difficult-to-access genomic elements. In addition, the scientists produced the full chloroplast genome in a single contig that includes “~25 kb of inverted repeat regions which typically collapse into a single copy during assembly,” they report.
“The Oropetium genome showcases the utility of SMRT sequencing for assembling high-quality plant and other eukaryotic genomes,” the scientists note, “and serves as a valuable resource for the plant comparative genomics community.”