Menu
July 7, 2019  |  

FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads.

We introduce FinisherSC, a repeat-aware and scalable tool for upgrading de novo assembly using long reads. Experiments with real data suggest that FinisherSC can provide longer and higher quality contigs than existing tools while maintaining high concordance.The tool and data are available and will be maintained at http://kakitone.github.io/finishingTool/: dntse@stanford.eduSupplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019  |  

Complete genome sequence of Arthrobacter sp. ERGS1:01, a putative novel bacterium with prospective cold active industrial enzymes, isolated from East Rathong glacier in India.

We report the complete genome sequence of Arthrobacter sp. ERGS1:01, a novel bacterium which produces industrial enzymes at low temperature. East Rathong glacier in Sikkim Himalayas is untouched and unexplored for microbial diversity though it has a rich source of glaciers, alpine and meadows. Genome sequence has provided the basis for understanding its adaptation under harsh condition of Himalayan glacier, its ability to produce cold active industrial enzymes and has unlocked opportunities for microbial bioprospection from East Rathong glacier. Copyright © 2015. Published by Elsevier B.V.


July 7, 2019  |  

In vivo evolution of bacterial resistance in two cases of Enterobacter aerogenes infections during treatment with imipenem.

Infections caused by multidrug resistant (MDR) bacteria are a major concern worldwide. Changes in membrane permeability, including decreased influx and/or increased efflux of antibiotics, are known as key contributors of bacterial MDR. Therefore, it is of critical importance to understand molecular mechanisms that link membrane permeability to MDR in order to design new antimicrobial strategies. In this work, we describe genotype-phenotype correlations in Enterobacter aerogenes, a clinically problematic and antibiotic resistant bacterium. To do this, series of clinical isolates have been periodically collected from two patients during chemotherapy with imipenem. The isolates exhibited different levels of resistance towards multiple classes of antibiotics, consistently with the presence or the absence of porins and efflux pumps. Transport assays were used to characterize membrane permeability defects. Simultaneous genome-wide analysis allowed the identification of putative mutations responsible for MDR. The genome of the imipenem-susceptible isolate G7 was sequenced to closure and used as a reference for comparative genomics. This approach uncovered several loci that were specifically mutated in MDR isolates and whose products are known to control membrane permeability. These were omp35 and omp36, encoding the two major porins; rob, encoding a global AraC-type transcriptional activator; cpxA, phoQ and pmrB, encoding sensor kinases of the CpxRA, PhoPQ and PmrAB two-component regulatory systems, respectively. This report provides a comprehensive analysis of membrane alterations relative to mutational steps in the evolution of MDR of a recognized nosocomial pathogen.


July 7, 2019  |  

Hybrid de novo tandem repeat detection using short and long reads.

As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.


July 7, 2019  |  

Genome sequence of a native-feather degrading extremely thermophilic Eubacterium, Fervidobacterium islandicum AW-1.

Fervidobacterium islandicum AW-1 (KCTC 4680) is an extremely thermophilic anaerobe isolated from a hot spring in Indonesia. This bacterium could degrade native chicken feathers completely at 70 °C within 48 h, which is of potential importance on the basis of relevant environmental and agricultural issues in bioremediation and development of eco-friendly bioprocesses for the treatment of native feathers. However, its genomic and phylogenetic analysis remains unclear. Here, we report the high-quality draft genome sequence of an extremely thermophilic anaerobe, F. islandicum AW-1. The genome consists of 2,359,755 bp, which encodes 2,184 protein-coding genes and 64 RNA-encoding genes. This may reveal insights into anaerobic metabolism for keratin degradation and also provide a biological option for poultry waste treatments.


July 7, 2019  |  

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.


July 7, 2019  |  

Hybrid de novo genome assembly of the Chinese herbal plant danshen (Salvia miltiorrhiza Bunge)

Danshen (Salvia miltiorrhiza Bunge), also known as Chinese red sage, is a member of Lamiaceae family. It is valued in traditional Chinese medicine, primarily for the treatment of cardiovascular and cerebrovascular diseases. Because of its pharmacological potential, ongoing research aims to identify novel bioactive compounds in danshen, and their biosynthetic pathways. To date, only expressed sequence tag (EST) and RNA-seq data for this herbal plant are available to the public. We therefore propose that the construction of a reference genome for danshen will help elucidate the biosynthetic pathways of important secondary metabolites, thereby advancing the investigation of novel drugs from this plant.


July 7, 2019  |  

Whole-genome sequence of an evolved Clostridium pasteurianum strain reveals Spo0A deficiency responsible for increased butanol production and superior growth.

Biodiesel production results in crude glycerol waste from the transesterification of fatty acids (10 % w/w). The solventogenic Clostridium pasteurianum, an anaerobic Firmicute, can produce butanol from glycerol as the sole carbon source. Coupling butanol fermentation with biodiesel production can improve the overall economic viability of biofuels. However, crude glycerol contains growth-inhibiting byproducts which reduce feedstock consumption and solvent production.To obtain a strain with improved characteristics, a random mutagenesis and directed evolution selection technique was used. A wild-type C. pasteurianum (ATCC 6013) culture was chemically mutagenized, and the resulting population underwent 10 days of selection in increasing concentrations of crude glycerol (80-150 g/L). The best-performing mutant (M150B) showed a 91 % increase in butanol production in 100 g/L crude glycerol compared to the wild-type strain, as well as increased growth rate, a higher final optical density, and less production of the side product PDO (1,3-propanediol). Wild-type and M150B strains were sequenced via Single Molecule Real-Time (SMRT) sequencing. Mutations introduced to the M150B genome were identified by sequence comparison to the wild-type and published closed sequences. A major mutation (a deletion) in the gene of the master transcriptional regulator of sporulation, Spo0A, was identified. A spo0A single gene knockout strain was constructed using a double–crossover genome-editing method. The Spo0A-deficient strain showed similar tolerance to crude glycerol as the evolved mutant strain M150B. Methylation patterns on genomic DNA identified by SMRT sequencing were used to transform plasmid DNA to overcome the native C. pasteurianum restriction endonuclease.Solvent production in the absence of Spo0A shows C. pasteurianum differs in solvent-production regulation compared to other solventogenic Clostridium. Growth-associated butanol production shows C. pasteurianum to be an attractive option for further engineering as it may prove a better candidate for butanol production through continuous fermentation.


July 7, 2019  |  

Botrytis, the good, the bad and the ugly

Botrytis spp. are efficient pathogens, causing devastating diseases and significant crop losses in a wide variety of plant species. Here we outline our review of these pathogens, as well as highlight the major advances of the past 10 years in studying Botrytis in interaction with its hosts. Progress in molecular genetics and the development of relevant phylogenetic markers in particular, has resulted in the characterisation of approximately 30 species. The host range of Botrytis spp. includes plant species that are members of 170 families of cultivated plants.


July 7, 2019  |  

The effects of read length, quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio.

The advent of next-generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single-molecule real-time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS-enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small-scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost-efficient solutions for multispecies microsatellite projects. © 2014 John Wiley & Sons Ltd.


July 7, 2019  |  

LoRDEC: accurate and efficient long read error correction.

PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec. © The Author 2014. Published by Oxford University Press.


July 7, 2019  |  

Sequence alignment tools: one parallel pattern to rule them all?

In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.


July 7, 2019  |  

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent ‘third-generation’ sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.