Error correction is typically the first step of de novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net.
Journal: Information sciences