Menu
July 7, 2019  |  

CoLoRMap: Correcting Long Reads by Mapping short reads.

Authors: Haghshenas, Ehsan and Hach, Faraz and Sahinalp, S Cenk and Chauve, Cedric

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/[email protected] or [email protected] data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected].

Journal: Bioinformatics
DOI: 10.1093/bioinformatics/btw463
Year: 2016

Read publication

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.