July 7, 2019 |

Cerulean: A hybrid assembly using high throughput short and long reads

Authors: Deshpande, Viraj and Fung, Eric D K and Pham, Son and Bafna, Vineet

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.

Journal:
DOI: 10.1007/978-3-642-40453-5_27
Year: 2013

Read publication

ALS case study

Support

Cerulean: A hybrid assembly using high throughput short and long reads

Talk with an expert