Menu
July 7, 2019

Ten steps to get started in Genome Assembly and Annotation.

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).


July 7, 2019

Complete genome sequence of a heavy metal resistant bacterium Maribacter cobaltidurans B1T, isolated from the deep-sea sediment of the South Atlantic Ocean

Many bacteria in the environment have adopted to the presence of toxic heavy metals. Here we present the complete genome sequence of a heavy metal resistant bacterium, Maribacter cobaltidurans B1T (=CGMCC 1.15508T=KCTC 52882T=MCCC 1K03318T), which was isolated from a deep-sea sediment sample collected from the South Atlantic Ocean. Strain B1T is able to resist high concentrations of Co2+ (10.0mM) in Marine Agar 2216. The genome of strain B1T comprises 4,639,957bp in a circular chromosome with G+C content of 39.7mol%. Resistance to Co2+ is mainly based on efflux system in the genome of stain B1T, including czcCBA operons, czcD genes, corC genes, etc. Comparing with the closely related species M. orientalis DSM 16471T, the genome of B1T harbors twenty more copies of genes in czcCBA operon and two copies of the czcD genes related to Co2+ efflux. The function of these genes may contribute to the high level of cobalt resistance, revealing its potential application in biotechnological industry.


July 7, 2019

Complete genome sequence of Tsukamurella sp. MH1: A wide-chain length alkane-degrading actinomycete.

Tsukamurella sp. strain MH1, capable to use a wide range of n-alkanes as the only carbon source, was isolated from petroleum-contaminated soil (Pite?ti, Romania) and its complete genome was sequenced. The 4,922,396?bp genome contains only one circular chromosome with a G?+?C content of 71.12%, much higher than the type strains of this genus (68.4%). Based on the 16S rRNA genes sequence similarity, strain MH1 was taxonomically identified as Tsukamurella carboxydivorans. Genome analyses revealed that strain MH1 is harboring only one gene encoding for the alkB-like hydroxylase, arranged in a complete alkane monooxygenase operon. This is the first complete genome of the specie T. carboxydivorans, which will provide insights into the potential of Tsukamurella sp. MH1 and related strains for bioremediation of petroleum hydrocarbons-contaminated sites and into the environmental role of these bacteria. Copyright © 2017. Published by Elsevier B.V.


July 7, 2019

A fast approximate algorithm for mapping long reads to large reference databases.

Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290?×?faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each =5?kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.


July 7, 2019

Tigmint: correcting assembly errors using linked reads from large molecules.

Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap.To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing.Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.


July 7, 2019

Draft genome sequence of a Shewanella halifaxensis strain isolated from the intestine of marine red seabream (Pagrus major), which includes an integrative conjugative element with macrolide resistance genes.

Shewanella halifaxensis strain 6JANF4-E-4 was isolated from the intestine of a red seabream (Pagrus major). Here, we report the draft genome sequence of this bacterium, which includes an integrative conjugative element of the SXT/R391 family, where the macrolide resistance determinants mef(C) and mph(G) exist. Copyright © 2018 Sugimoto et al.


July 7, 2019

Complete genome sequence of a ciprofloxacin-resistant Salmonella enterica subsp. enterica serovar Kentucky sequence type 198 strain, PU131, isolated from a human patient in Washington State.

Strains of the ciprofloxacin-resistant (Cipr) Salmonella enterica subsp. enterica serovar Kentucky sequence type 198 (ST198) have rapidly and extensively disseminated globally to become a major food safety and public health concern. Here, we report the complete genome sequence of a CiprS. Kentucky ST198 strain, PU131, isolated from a human patient in Washington State (USA).


July 7, 2019

Complete genome sequence of the freshwater bacterium Beggiatoa leptomitoformis strain D-401.

Here, we report the complete closed genome sequence and methylome analysis of Beggiatoa leptomitoformis strain D-401 (DSM 14945, UNIQEMU 779), which is quite different from the previously described Beggiatoa leptomitoformis neotype strain D-402T (DSM 14946, UNIQEM U 779) with regard to morphology and lithotrophic growth in the presence of thiosulfate. Copyright © 2018 Fomenkov et al.


July 7, 2019

Complete genome sequence of Colwellia hornerae PAMC 20917, a cold-active enzyme-producing bacterium isolated from the Arctic Ocean sediment

Psychrophilic bacteria are considered a source of cold-active enzymes that can be used in industrial applications. The Arctic bacterium Colwellia hornerae PAMC 20917 strain has been isolated from the offshore sediment near Ny-Ålesund, Svalbard. The optimal growth temperature of the strain was 10?°C on marine agar. The cell lysate showed alkaline phosphatase activities. Analysis of the enzymatic properties showed that the alkaline phosphatase was cold-active and thermolabile. To explore useful cold-active industrial enzymes further, the entire genome of the PAMC 20917 strain was sequenced. The genome of the strain contained 4,684,314 nucleotides, with 37.87% G+C content. Genome mining analysis revealed that, in the complete genome sequence, three proteins were annotated as alkaline phosphatases. The genome of PAMC 20917 encodes cold shock proteins and an ice-binding protein that inhibits the growth of ice, allowing the bacterium to adapt to cold environments. This genome information may be useful for understanding mechanisms of adaptation to cold stress.


July 7, 2019

Complete genome sequence of Bacillus licheniformis BL-010.

The biodegradation of Aflatoxin B1 (AFB1) is an industry of increasing importance. Bacillus licheniformis BL-010 was isolated from the aflatoxin contaminated corn feed storage, and was shown to degrade AFB1 efficiently. Here we present the complete genome sequence of BL-010, the genome comprises 4,287,714 bp in a circular chromosome with a GC content of 46.12% and contains genes encoding AFB1 degrading enzymes. The genome sequence displayed that this strain contains genes involved in production of laccase, aromatic ring-opening dioxygenase which could detoxify AFB1. Complete genome sequence of the strain BL-010 can further provide the genomic basis for the biotechnological application of strain BL-010 as an effective way to degrade AFB1. Copyright © 2018 Elsevier Ltd. All rights reserved.


July 7, 2019

Complete genome sequence of Bacillus subtilis strain DKU_NT_02, isolated from traditional Korean food using soybean (chung-gook-jang) for high-quality poly-?-glutamic acid activity.

The complete genome sequence of Bacillus subtilis strain DKU_NT_02, isolated from traditional Korean food using soybeans (chung-gook-jang), is presented here. This strain was chosen to help identify genetic factors with high-quality poly-?-glutamic acid (?PGA) activity. Copyright © 2018 Bang et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.