Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (=50?bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.
Journal: Scientific reports