July 7, 2019  |  

Genome graphs

Authors: Novak, Adam M and Hickey, Glenn and Garrison, Erik and Blum, Sean and Connelly, Abram and Dilthey, Alexander and Eizenga, Jordan and Elmohamed, M. A. Saleh and Guthrie, Sally and Kahles, André and Keenan, Stephen and Kelleher, Jerome and Kural, Deniz and Li, Heng and Lin, Michael F and Miga, Karen and Ouyang, Nancy and Rakocevic, Goran and Smuga-Otto, Maciek and Zaranek, Alexander Wait and Durbin, Richard and McVean, Gil and Haussler, David and Paten, Benedict

There is increasing recognition that a single, monoploid reference genome is a poor universal reference structure for human genetics, because it represents only a tiny fraction of human variation. Adding this missing variation results in a structure that can be described as a mathematical graph: a genome graph. We demonstrate that, in comparison to the existing reference genome (GRCh38), genome graphs can substantially improve the fractions of reads that map uniquely and perfectly. Furthermore, we show that this fundamental simplification of read mapping transforms the variant calling problem from one in which many non-reference variants must be discovered de-novo to one in which the vast majority of variants are simply re-identified within the graph. Using standard benchmarks as well as a novel reference-free evaluation, we show that a simplistic variant calling procedure on a genome graph can already call variants at least as well as, and in many cases better than, a state-of-the-art method on the linear human reference genome. We anticipate that graph-based references will supplant linear references in humans and in other applications where cohorts of sequenced individuals are available.

Journal: BioRxiv
DOI: 10.1101/101378
Year: 2017

Read publication

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.