Whole genome sequencing (WGS) has increased in popularity and decreased in cost over the past decade, rendering this approach as a viable and sensitive method for variant detection. In addition to its utility for single nucleotide variant detection, WGS data has the potential to detect Copy Number Variants (CNV) to fine resolution. Many CNV detection software packages have been developed exploiting four main types of data: read pair, split read, read depth, and assembly based methods. The aim of this study was to evaluate the efficiency of each of these main approaches in detecting germline deletions.WGS data and high confidence deletion calls for the individual NA12878 from the Genome in a Bottle consortium were the benchmark dataset. The performance of BreakDancer, CNVnator, Delly, FermiKit, and Pindel was assessed by comparing the accuracy and sensitivity of each software package in detecting deletions exceeding 1?kb.There was considerable variability in the outputs of the different WGS CNV detection programs. The best performance was seen from BreakDancer and Delly, with 92.6% and 96.7% sensitivity, respectively and 34.5% and 68.5% false discovery rate (FDR), respectively. In comparison, Pindel, CNVnator, and FermiKit were less effective with sensitivities of 69.1%, 66.0%, and 15.8%, respectively and FDR of 91.3%, 69.0%, and 31.7%, respectively. Concordance across software packages was poor, with only 27 of the total 612 benchmark deletions identified by all five methodologies.The WGS based CNV detection tools evaluated show disparate performance in identifying deletions =1?kb, particularly those utilising different input data characteristics. Software that exploits read pair based data had the highest sensitivity, namely BreakDancer and Delly. BreakDancer also had the second lowest false discovery rate. Therefore, in this analysis read pair methods (BreakDancer in particular) were the best performing approaches for the identification of deletions =1?kb, balancing accuracy and sensitivity. There is potential for improvement in the detection algorithms, particularly for reducing FDR. This analysis has validated the utility of WGS based CNV detection software to reliably identify deletions, and these findings will be of use when choosing appropriate software for deletion detection, in both research and diagnostic medicine.Copyright © 2019 Elsevier Inc. All rights reserved.
Journal: Journal of biomedical informatics