Menu
July 7, 2019

Contiguity: Contig adjacency graph construction and visualisation

Contiguity is interactive software for the visualization and manipulation of de novo genome assemblies. 14 Contiguity creates and displays information on contig adjacency which is contextualized by the 15 simultaneous display of a comparison between assembled contigs and reference sequence. Where 16 scaffolders allow unambiguous connections between contigs to be resolved into a single scaffold, 17 Contiguity allows the user to create all potential scaffolds in ambiguous regions of the genome. This 18 enables the resolution of novel sequence or structural variants from the assembly. In addition, 19 Contiguity provides a sequencing and assembly agnostic approach for the creation of contig adjacency 20 graphs. To maximize the number of contig adjacencies determined, Contiguity combines information 21 from read pair mappings, sequence overlap and De Bruijn graph exploration. We demonstrate how 22 highly sensitive graphs can be achieved using this method. Contig adjacency graphs allow the user to 23 visualize potential arrangements of contigs in unresolvable areas of the genome. By combining 24 adjacency information with comparative genomics, Contiguity provides an intuitive approach for 25 exploring and improving sequence assemblies. It is also useful in guiding manual closure of long read 26 sequence assemblies. Contiguity is an open source application, implemented using Python and the 27 Tkinter GUI package that can run on any Unix, OSX and Windows operating system. It has been 28 designed and optimized for bacterial assemblies. Contiguity is available at 29 http://mjsull.github.io/Contiguity .


July 7, 2019

Twenty years of bacterial genome sequencing.

Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a ‘sequencing singularity’, where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond.


July 7, 2019

A rebeccamycin analog provides plasmid-encoded niche defense.

Bacterial symbionts of fungus-growing ants occupy a highly specialized ecological niche and face the constant existential threat of displacement by another strain of ant-adapted bacteria. As part of a systematic study of the small molecules underlying this fraternal competition, we discovered an analog of the antitumor agent rebeccamycin, a member of the increasingly important indolocarbazole family. While several gene clusters consistent with this molecule’s newly reported modification had previously been identified in metagenomic studies, the metabolite itself has been cryptic. The biosynthetic gene cluster for 9-methoxyrebeccamycin is encoded on a plasmid in a manner reminiscent of plasmid-derived peptide antimicrobials that commonly mediate antagonism among closely related Gram-negative bacteria.


July 7, 2019

Complete genome sequence of Bacillus cereus FORC_005, a food-borne pathogen from the soy sauce braised fish-cake with quail-egg.

Due to abundant contamination in various foods, the pathogenesis of Bacillus cereus has been widely studied in physiological and molecular level. B. cereus FORC_005 was isolated from a Korean side dish, soy sauce braised fish-cake with quail-egg in South Korea. While 21 complete genome sequences of B. cereus has been announced to date, this strain was completely sequenced, analyzed, and compared with other complete genome sequences of B. cereus to elucidate the distinct pathogenic features of a strain isolated in South Korea. The genomic DNA containing a circular chromosome consists of 5,349,617-bp with a GC content of 35.29 %. It was predicted to have 5170 open reading frames, 106 tRNA genes, and 42 rRNA genes. Among the predicted ORFs, 3892 ORFs were annotated to encode functional proteins (75.28 %) and 1278 ORFs were predicted to encode hypothetical proteins (748 conserved and 530 non-conserved hypothetical proteins). This genome information of B. cereus FORC_005 would extend our understanding of its pathogenesis in genomic level for efficient control of its contamination in foods and further food poisoning.


July 7, 2019

Genome analysis of Staphylococcus agnetis, an agent of lameness in broiler chickens.

Lameness in broiler chickens is a significant animal welfare and financial issue. Lameness can be enhanced by rearing young broilers on wire flooring. We have identified Staphylococcus agnetis as significantly involved in bacterial chondronecrosis with osteomyelitis (BCO) in proximal tibia and femorae, leading to lameness in broiler chickens in the wire floor system. Administration of S. agnetis in water induces lameness. Previously reported in some cases of cattle mastitis, this is the first report of this poorly described pathogen in chickens. We used long and short read next generation sequencing to assemble single finished contigs for the genome and a large plasmid from the chicken pathogen. Comparison of the S. agnetis genome to those of other pathogenic Staphylococci shows that S.agnetis contains a distinct repertoire of virulence determinants. Additionally, the S. agnetis genome has several regions that differ substantially from the genomes of other pathogenic Staphylococci. Comparison of our finished genome to a recent draft genome for a cattle mastitis isolate suggests that future investigations focus on the evolutionary epidemiology of this emerging pathogen of domestic animals.


July 7, 2019

Next-generation sequencing and comparative analysis of sequential outbreaks caused by multidrug-resistant Acinetobacter baumannii at a large academic burn center.

Next-generation sequencing (NGS) analysis has emerged as a promising molecular epidemiological method for investigating health care-associated outbreaks. Here, we used NGS to investigate a 3-year outbreak of multidrug-resistant Acinetobacter baumannii (MDRAB) at a large academic burn center. A reference genome from the index case was generated using de novo assembly of PacBio reads. Forty-six MDRAB isolates were analyzed by pulsed-field gel electrophoresis (PFGE) and sequenced using an Illumina platform. After mapping to the index case reference genome, four samples were excluded due to low coverage, leaving 42 samples for further analysis. Multilocus sequence types (MLST) and the presence of acquired resistance genes were also determined from the sequencing data. A transmission network was inferred from genomic and epidemiological data using a Bayesian framework. Based on single-nucleotide variant (SNV) differences, this MDRAB outbreak represented three sequential outbreaks caused by distinct clones. The first and second outbreaks were caused by sequence type 2 (ST2), while the third outbreak was caused by ST79. For the second outbreak, the MLST and PFGE results were discordant. However, NGS-based SNV typing detected a recombination event and consequently enabled a more accurate phylogenetic analysis. The distribution of resistance genes varied among the three outbreaks. The first- and second-outbreak strains possessed a blaOXA-23-like group, while the third-outbreak strains harbored a blaOXA-40-like group. NGS-based analysis demonstrated the superior resolution of outbreak transmission networks for MDRAB and provided insight into the mechanisms of strain diversification between sequential outbreaks through recombination. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 7, 2019

Mucinivorans hirudinis gen. nov., sp. nov., an anaerobic, mucin-degrading bacterium isolated from the digestive tract of the medicinal leech Hirudo verbana.

Three anaerobic bacterial strains were isolated from the digestive tract of the medicinal leech Hirudo verbana, using mucin as the primary carbon and energy source. These strains, designated M3(T), M4 and M6, were Gram-stain-negative, non-spore-forming and non-motile. Cells were elongated bacilli approximately 2.4 µm long and 0.6 µm wide. Growth only occurred anaerobically under mesophilic and neutral pH conditions. All three strains could utilize multiple simple and complex sugars as carbon sources, with glucose fermented to acid by-products. The DNA G+C contents of strains M3(T), M4 and M6 were 44.9, 44.8 and 44.8 mol%, respectively. The major cellular fatty acid of strain M3(T) was iso-C15?:?0. Phylogenetic analysis of full-length 16S rRNA gene sequences revealed that the three strains shared >99?% similarity with each other and represent a new lineage within the family Rikenellaceae of the order Bacteroidales, phylum Bacteroidetes. The most closely related bacteria to strain M3(T) based on 16S rRNA gene sequences were Rikenella microfusus DSM 15922(T) (87.3?% similarity) and Alistipes finegoldii AHN 2437(T) (87.4?%). On the basis of phenotypic, genotypic and physiological evidence, strains M3(T), M4 and M6 are proposed as representing a novel species of a new genus within the family Rikenellaceae, for which the name Mucinivorans hirudinis gen. nov., sp. nov. is proposed. The type strain of Mucinivorans hirudinis is M3(T) (?=?ATCC BAA-2553(T)?=?DSM 27344(T)). © 2015 IUMS.


July 7, 2019

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms.Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible.FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step.FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge datasets for detecting genetic engineering toolmarks, etc.


July 7, 2019

Automated ensemble assembly and validation of microbial genomes.

The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible.To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers.Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.


July 7, 2019

Genome sequence of Candidatus Nitrososphaera evergladensis from group I.1b enriched from Everglades soil reveals novel genomic features of the ammonia-oxidizing archaea.

The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group.


July 7, 2019

Genome sequence of the chromate-resistant bacterium Leucobacter salsicius type strain M1-8(T.).

Leucobacter salsicius M1-8(T) is a member of the Microbacteriaceae family within the class Actinomycetales. This strain is a Gram-positive, rod-shaped bacterium and was previously isolated from a Korean fermented food. Most members of the genus Leucobacter are chromate-resistant and this feature could be exploited in biotechnological applications. However, the genus Leucobacter is poorly characterized at the genome level, despite its potential importance. Thus, the present study determined the features of Leucobacter salsicius M1-8(T), as well as its genome sequence and annotation. The genome comprised 3,185,418 bp with a G+C content of 64.5%, which included 2,865 protein-coding genes and 68 RNA genes. This strain possessed two predicted genes associated with chromate resistance, which might facilitate its growth in heavy metal-rich environments.


July 7, 2019

Sequence alignment tools: one parallel pattern to rule them all?

In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.


July 7, 2019

Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology.

The largest known DNA viruses infect Acanthamoeba and belong to two markedly different families. The Megaviridae exhibit pseudo-icosahedral virions up to 0.7 µm in diameter and adenine-thymine (AT)-rich genomes of up to 1.25 Mb encoding a thousand proteins. Like their Mimivirus prototype discovered 10 y ago, they entirely replicate within cytoplasmic virion factories. In contrast, the recently discovered Pandoraviruses exhibit larger amphora-shaped virions 1 µm in length and guanine-cytosine-rich genomes up to 2.8 Mb long encoding up to 2,500 proteins. Their replication involves the host nucleus. Whereas the Megaviridae share some general features with the previously described icosahedral large DNA viruses, the Pandoraviruses appear unrelated to them. Here we report the discovery of a third type of giant virus combining an even larger pandoravirus-like particle 1.5 µm in length with a surprisingly smaller 600 kb AT-rich genome, a gene content more similar to Iridoviruses and Marseillevirus, and a fully cytoplasmic replication reminiscent of the Megaviridae. This suggests that pandoravirus-like particles may be associated with a variety of virus families more diverse than previously envisioned. This giant virus, named Pithovirus sibericum, was isolated from a >30,000-y-old radiocarbon-dated sample when we initiated a survey of the virome of Siberian permafrost. The revival of such an ancestral amoeba-infecting virus used as a safe indicator of the possible presence of pathogenic DNA viruses, suggests that the thawing of permafrost either from global warming or industrial exploitation of circumpolar regions might not be exempt from future threats to human or animal health.


July 7, 2019

Detecting authorized and unauthorized genetically modified organisms containing vip3A by real-time PCR and next-generation sequencing.

The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.