We present an open implementation of the HyperLogLog cardinality estimation sketch for counting fixed-length substrings of DNA strings (k-mers). The HyperLogLog sketch implementation is in C++ with a Python interface, and is distributed as part of the khmer software package. khmer is freely available from urlhttps://github.com/dib-lab/khmer under a BSD License. The features presented here are included in version 1.4 and later.
CD1 molecules are antigen-presenting glycoproteins primarily found on dendritic cells (DCs) responsible for lipid antigen presentation to CD1-restricted T cells. Despite their pivotal role in immunity, little is known about CD1 protein expression in dogs, notably due to lack of isoform-specific antibodies. The canine (Canis familiaris) CD1 locus was previously found to contain three functional CD1A genes: canCD1A2, canCD1A6, and canCD1A8, where two variants of canCD1A8, canCD1A8.1 and canCD1A8.2, were assumed to be allelic variants. However, we hypothesized that these rather represented two separate genes. Sequencing of three overlapping bacterial artificial chromosomes (BACs) spanning the entire canine CD1 locus revealed…
Finished genome sequences are presented for four Escherichia coli strains isolated from bloodstream infections at San Francisco General Hospital. These strains provide reference sequences for four major fimH-identified sublineages within the multilocus sequence type (MLST) ST95 group, and provide insights into pathogenicity and differential antimicrobial susceptibility within this group. Copyright © 2015 Stephens et al.
Simon Chan, UC Davis on how PacBio long read sequencing revealed higher order repeats in centromeres of switchgrass which would have been hidden if you are restricted by the much shorter Sanger reads.