July 7, 2019  |  

Efficient cardinality estimation for k-mers in large DNA sequencing data sets

Authors: Junior, Luiz Carlos Irber and Brown, C Titus

We present an open implementation of the HyperLogLog cardinality estimation sketch for counting fixed-length substrings of DNA strings (k-mers). The HyperLogLog sketch implementation is in C++ with a Python interface, and is distributed as part of the khmer software package. khmer is freely available from urlhttps://github.com/dib-lab/khmer under a BSD License. The features presented here are included in version 1.4 and later.

Journal: BioRxiv
DOI: 10.1101/056846
Year: 2016

