BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper

De novo assembly is the process of reconstructing genomes from DNA fragments (reads), which may contain redundancy and errors. Longer reads simplify assembly and improve contiguity of the output, but current long-read technologies come with high error rates. A crucial step of de novo genome assembly for long reads consists of finding overlapping reads. We present Berkeley Long-Read to Long-Read Aligner and Overlapper (BELLA), which implement a novel approach to compute overlaps using Sparse Generalized Matrix Multiplication (SpGEMM). We present a probabilistic model which demonstrates the soundness of using short, fixed length k-mers to detect overlaps, avoiding expensive pairwise alignment of all reads against all others. We then introduce a notion of reliable k-mers based on our probabilistic model. The use of reliable k-mers eliminates both the k-mer set explosion that would otherwise happen with highly erroneous reads and the spurious overlaps due to k-mers originating from repetitive regions. Finally, we present a new method to separate true alignments from false positives depending on the alignment score. Using this methodology, which is employed in BELLAtextquoterights precise mode, the probability of false positives drops exponentially as the length of overlap between sequences increases. On simulated data, BELLA achieves an average of 2.26% higher recall than state-of-the-art tools in its sensitive mode and 18.90% higher precision than state-of-the-art tools in its precise mode, while being performance competitive.

Genomics and biochemistry investigation on the metabolic pathway of milled wood and alkali lignin-derived aromatic metabolites of Comamonas serinivorans SP-35.

The efficient depolymerization and utilization of lignin are one of the most important goals for the renewable use of lignocelluloses. The degradation and complete mineralization of lignin by bacteria represent a key step for carbon recycling in land ecosystems as well. However, many aspects of this process remain unclear, for example, the complex network of metabolic pathways involved in the degradation of lignin and the catabolic pathway of intermediate aromatic metabolites. To address these subjects, we characterized the deconstruction and mineralization of lignin with milled wood lignin (MWL, the most representative molecule of lignin in its native state) and alkali lignin (AL), and elucidated metabolic pathways of their intermediate metabolites by a bacterium named Comamonas serinivorans SP-35.The degradation rate of MWL reached 30.9%, and its particle size range was decreased from 6 to 30 µm to 2-4 µm-when cultured with C. serinivorans SP35 over 7 days. FTIR analysis showed that the C-C and C-O-C bonds between the phenyl propane structures of lignin were oxidized and cleaved and the side chain structure was modified. More than twenty intermediate aromatic metabolites were identified in the MWL and AL cultures based on GC-MS analysis. Through genome sequencing and annotation, and from GC-MS analysis, 93 genes encoding 33 enzymes and 5 regulatory factors that may be involved in lignin degradation were identified and more than nine metabolic pathways of lignin and its intermediates were predicted. Of particular note is that the metabolic pathway to form the powerful antioxidant 3,4-dihydroxyphenylglycol is described for the first time in bacteria.Elucidation of the ß-aryl ether cleavage pathway in the strain SP-35 indicates that the ß-aryl ether catabolic system is not only present in the family of Sphingomonadaceae, but also other species of bacteria kingdom. These newly elucidated catabolic pathways of lignin in strain SP-35 and the enzymes responsible for them provide exciting biotechnological opportunities for lignin valorization in future.

Emergence of tigecycline resistance in Escherichia coli co-producing MCR-1 and NDM-5 during tigecycline salvage treatment.

Here, we report a case of severe infection caused by Escherichia coli that harbored mcr-1, blaNDM-5, and acquired resistance to tigecycline during tigecycline salvage therapy.Antimicrobial susceptibility testing, Southern blot hybridization, and complete genome sequence of the strains were carried out. The genetic characteristics of the mcr-1 and blaNDM-5 plasmids were analyzed. The whole genome sequencing of mcr-1-containing plasmid was completed. Finally, putative single nucleotide polymorphisms and deletion mutations in the tigecycline-resistant strain were predicted.Three E. coli isolates were obtained from ascites, pleural effusion, and stool of a patient; they were resistant to almost all the tested antibiotics. The first two strains separated from ascites (E-FQ) and hydrothorax (E-XS) were susceptible to amikacin and tigecycline; however, the third strain from stool (E-DB) was resistant to tigecycline after nearly 3 weeks’ treatment with tigecycline. All three isolates possessed both mcr-1 and blaNDM-5. The blaNDM-5 gene was found on the IncX3 plasmid, whereas the mcr-1, fosA3 and blaCTX-M-14 were located on the IncHI2 plasmid. Mutations in acrB and lon were the reason for the resistance to tigecycline.This is the first report of a colistin-, carbapenem-, and tigecycline-resistant E. coli in China. Tigecycline resistance acquired during tigecycline therapy is of great concern for us because tigecycline is a drug of last resort to treat carbapenem-resistant Gram-negative bacterial infections. Furthermore, the transmission of such extensively drug-resistant isolates may pose a great threat to public health.

Complete genome sequence of the polymyxin E (colistin)-producing Paenibacillus sp. strain B-LR.

Paenibacillus bacteria are recovered from varied niches, including human lung, rhizosphere, marine sediments, and hemolymph. Paenibacilli can have plant growth-promoting activities and be antibiotic producers. They can produce exopolysaccharides and enzymes of industrial interest. Illumina and PacBio reads were used to produce a complete genome sequence of the colistin producer Paenibacillus sp. strain B-LR.

Velez bacillusL-1The pear Botrytis cinerea and Penicillium bacteria of suppression role evaluation and all Genome Analysis

Complete genome sequence of Lactobacillus koreensis 26-25, a ginsenoside converting bacterium, isolated from Korean kimchi

A Gram-positive, rod-shaped, ivory colored, and motile, Lactobacillus koreensis 26-25 was isolated from Korean kimchi. Strain 26-25 showed the ability of conversion from major ginsenosides into minor ginsenosides for which whole genome was sequenced. The whole genome sequence of Lactobacillus koreensis 26-25 consisted of one circular chromosome comprised of 3,006,812 bp, with a DNA G + C content of 49.23%. The whole genome analysis of strain 26-25 showed many glycosides hydrolase genes, which may contribute to identify the genes responsible for transformation of major ginsenosides into minor ginsenosides for its high pharmacological effects.

Complete genome of the multidrug-resistant Escherichia coli strain KBN10P04869 isolated from a patient with acute myeloid leukemia

Recently, we isolated a multidrug-resistant Escherichia coli strain KBN10P04869 from a patient with acute myeloid leukemia. We report the complete genome of this strain which consists of 5,104,264 bp with 4,457 protein-coding genes, 88 tRNAs, and 22 rRNAs, and the co-occurrence of multidrug- resistant genes including bla CMY-2, bla TEM-1, bla CTX-M-15, bla NDM-5, and blaOXA-18.

