September 22, 2019  |  

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.

Authors: Shrestha, Anish M S and Frith, Martin C and Asai, Kiyoshi and Richard, Hugues

Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls-with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (=20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at

Journal: Nucleic acids research
DOI: 10.1093/nar/gkx1175
Year: 2018

