Menu
July 7, 2019

Hybrid de novo tandem repeat detection using short and long reads.

As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.


July 7, 2019

Complete genome and plasmid sequences of three Canadian strains of Salmonella enterica subsp. enterica serovar Enteritidis belonging to phage types 8, 13, and 13a.

Salmonella enterica subsp. enterica serovar Enteritidis is a prominent cause of human salmonellosis frequently linked to poultry products. In Canada, S. Enteritidis phage types 8, 13, and 13a predominate among both clinical and poultry isolates. Here, we report the complete genome and plasmid sequences of poultry isolates of these three phage types. Copyright © 2015 Rehman et al.


July 7, 2019

Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano.

The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell-fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ~75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50 = 222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.


July 7, 2019

One Codex: A sensitive and accurate data platform for genomic microbial identification

High-throughput sequencing (HTS) is increasingly being used for broad applications of microbial characterization, such as microbial ecology, clinical diagnosis, and outbreak epidemiology. However, the analytical task of comparing short sequence reads against the known diversity of microbial life has proved to be computationally challenging. The One Codex data platform was created with the dual goals of analyzing microbial data against the largest possible collection of microbial reference genomes, as well as presenting those results in a format that is consumable by applied end-users. One Codex identifies microbial sequences using a “k-mer based” taxonomic classification algorithm through a web-based data platform, using a reference database that currently includes approximately 40,000 bacterial, viral, fungal, and protozoan genomes. In order to evaluate whether this classification method and associated database provided quantitatively different performance for microbial identification, we created a large and diverse evaluation dataset containing 50 million reads from 10,639 genomes, as well as sequences from six organisms novel species not be included in the reference databases of any of the tested classifiers. Quantitative evaluation of several published microbial detection methods shows that One Codex has the highest degree of sensitivity and specificity (AUC = 0.97, compared to 0.82-0.88 for other methods), both when detecting well-characterized species as well as newly sequenced, “taxonomically novel” organisms.


July 7, 2019

CHOgenome.org 2.0: Genome resources and website updates.

Chinese hamster ovary (CHO) cells are a major host cell line for the production of therapeutic proteins, and CHO cell and Chinese hamster (CH) genomes have recently been sequenced using next-generation sequencing methods. CHOgenome.org was launched in 2011 (version 1.0) to serve as a database repository and to provide bioinformatics tools for the CHO community. CHOgenome.org (version 1.0) maintained GenBank CHO-K1 genome data, identified CHO-omics literature, and provided a CHO-specific BLAST service. Recent major updates to CHOgenome.org (version 2.0) include new sequence and annotation databases for both CHO and CH genomes, a more user-friendly website, and new research tools, including a proteome browser and a genome viewer. CHO cell-line specific sequences and annotations facilitate cell line development opportunities, several of which are discussed. Moving forward, CHOgenome.org will host the increasing amount of CHO-omics data and continue to make useful bioinformatics tools available to the CHO community. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


July 7, 2019

Complete genome sequence of the probiotic bacterium Bifidobacterium breve KCTC 12201BP isolated from a healthy infant.

We present the completely sequenced genome of Bifidobacterium breve CBT BR3, which was isolated from the feces of a healthy infant. The 2.43-Mb genome contains several kinds of genetic factors associated with health promotion of the human host such as oligosaccharide-degrading genes and vitamin-biosynthetic genes. Copyright © 2015. Published by Elsevier B.V.


July 7, 2019

Insights on virulence from the complete genome of Staphylococcus capitis.

Staphylococcus capitis is an opportunistic pathogen of the coagulase negative staphylococci (CoNS). Functional genomic studies of S. capitis have thus far been limited by a lack of available complete genome sequences. Here, we determined the closed S. capitis genome and methylome using Single Molecule Real Time (SMRT) sequencing. The strain, AYP1020, harbors a single circular chromosome of 2.44 Mb encoding 2304 predicted proteins, which is the smallest of all complete staphylococcal genomes sequenced to date. AYP1020 harbors two large mobile genetic elements; a plasmid designated pAYP1020 (59.6 Kb) and a prophage, FAYP1020 (48.5 Kb). Methylome analysis identified significant adenine methylation across the genome involving two distinct methylation motifs (1972 putative 6-methyladenine (m6A) residues identified). Putative adenine methyltransferases were also identified. Comparative analysis of AYP1020 and the closely related CoNS, S. epidermidis RP62a, revealed a host of virulence factors that likely contribute to S. capitis pathogenicity, most notably genes important for biofilm formation and a suite of phenol soluble modulins (PSMs); the expression/production of these factors were corroborated by functional assays. The complete S. capitis genome will aid future studies on the evolution and pathogenesis of the coagulase negative staphylococci.


July 7, 2019

PAFFT: A new homology search algorithm for third-generation sequencers.

DNA sequencers that can conduct real-time sequencing from a single polymerase molecule are known as third-generation sequencers. Third-generation sequencers enable sequencing of reads that are several kilobases long. However, the raw data generated from third-generation sequencers are known to be error-prone. Because of sequencing errors, it is difficult to identify which genes are homologous to the reads obtained using third-generation sequencers. In this study, a new method for homology search algorithm, PAFFT, is developed. This method is the extension of the MAFFT algorithm which was used for multiple alignments. PAFFT detects global homology rather than local homology so that homologous regions can be detected even when the error rate of sequencing is high. PAFFT will boost application of third-generation sequencers. Copyright © 2015 Elsevier Inc. All rights reserved.


July 7, 2019

Methicillin-susceptible, vancomycin-resistant Staphylococcus aureus, Brazil.

We report characterization of a methicillin-susceptible, vancomycin-resistant bloodstream isolate of Staphylococcus aureus recovered from a patient in Brazil. Emergence of vancomycin resistance in methicillin-susceptible S. aureus would indicate that this resistance trait might be poised to disseminate more rapidly among S. aureus and represents a major public health threat.


July 7, 2019

First detection of Klebsiella variicola producing OXA-181 carbapenemase in fresh vegetable imported from Asia to Switzerland.

The emergence and worldwide spread of carbapenemase-producing Enterobacteriaceae is of great concern to public health services. The aim of this study was to investigate the occurrence of carbapenemase-producing Enterobacteriaceae in fresh vegetables and spices imported from Asia to Switzerland.Twenty-two different fresh vegetable samples were purchased in March 2015 from different retail shops specializing in Asian food. The vegetables included basil leaves, bergamont leaves, coriander, curry leaves, eggplant and okra (marrow). Samples had been imported from Thailand, the Socialist Republic of Vietnam and India. After an initial enrichment-step, carbapenemase-producing Enterobacteriaceae were isolated from two carbapenem-containing selective media (SUPERCARBA II and Brilliance CRE Agar). Isolates were screened by PCR for the presence of bla KPC, bla NDM, bla OXA-48-like and bla VIM. An OXA-181-producing Klebsiella variicola was isolated in a coriander sample with origin Thailand/Vietnam. The bla OXA-181 gene was encoded in a 14’027 bp region flanked by two IS26-like elements on a 51-kb IncX3-type plasmid.The results of this study suggest that the international production and trade of fresh vegetables constitute a possible route for the spread of carbapenemase-producing Enterobacteriaceae. The presence of carbapenemase-producing organisms in the food supply is alarming and an important food safety issue.


July 7, 2019

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.


July 7, 2019

Complete genome sequence of the Clostridium difficile type strain DSM 1296T.

In this study, we sequenced the complete genome of the Clostridium difficile type strain DSM 1296(T). A combination of single-molecule real-time (SMRT) and Illumina sequencing technology revealed the presence of one chromosome and two extrachromosomal elements, the bacteriophage phiCDIF1296T and a putative plasmid-like structure harboring genes of another bacteriophage. Copyright © 2015 Riedel et al.


July 7, 2019

Completing the human genome: the progress and challenge of satellite DNA assembly.

Genomic studies rely on accurate chromosome assemblies to explore sequence-based models of cell biology, evolution and biomedical disease. However, even the extensively studied human genome has not yet reached a complete, ‘telomere-to-telomere’, chromosome assembly. The largest assembly gaps remain in centromeric regions and acrocentric short arms, sites known to contain megabase-sized arrays of tandem repeats, or satellite DNAs. This review aims to briefly address the progress and challenges of generating correct assemblies of satellite DNA arrays. Although the focus is placed on the human genome, many concepts presented here are applicable to other genomes.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.