July 7, 2019  |  

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Authors: Kosugi, Shunichi and Hirakawa, Hideki and Tabata, Satoshi

Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites.We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e. error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at[email protected] data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected].

Journal: Bioinformatics
DOI: 10.1093/bioinformatics/btv465
Year: 2015

Read publication

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.