Menu
July 7, 2019

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects.Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Insights into the preservation of the homomorphic sex-determining chromosome of Aedes aegypti from the discovery of a male-biased gene tightly linked to the M-locus.

The preservation of a homomorphic sex-determining chromosome in some organisms without transformation into a heteromorphic sex chromosome is a long-standing enigma in evolutionary biology. A dominant sex-determining locus (or M-locus) in an undifferentiated homomorphic chromosome confers the male phenotype in the yellow fever mosquito Aedes aegypti. Genetic evidence suggests that the M-locus is in a nonrecombining region. However, the molecular nature of the M-locus has not been characterized. Using a recently developed approach based on Illumina sequencing of male and female genomic DNA, we identified a novel gene, myo-sex, that is present almost exclusively in the male genome but can sporadically be found in the female genome due to recombination. For simplicity, we define sequences that are primarily found in the male genome as male-biased. Fluorescence in situ hybridization (FISH) on A. aegypti chromosomes demonstrated that the myo-sex probe localized to region 1q21, the established location of the M-locus. Myo-sex is a duplicated myosin heavy chain gene that is highly expressed in the pupa and adult male. Myo-sex shares 83% nucleotide identity and 97% amino acid identity with its closest autosomal paralog, consistent with ancient duplication followed by strong purifying selection. Compared with males, myo-sex is expressed at very low levels in the females that acquired it, indicating that myo-sex may be sexually antagonistic. This study establishes a framework to discover male-biased sequences within a homomorphic sex-determining chromosome and offers new insights into the evolutionary forces that have impeded the expansion of the nonrecombining M-locus in A. aegypti.


July 7, 2019

Dissecting a hidden gene duplication: the Arabidopsis thaliana SEC10 locus.

Repetitive sequences present a challenge for genome sequence assembly, and highly similar segmental duplications may disappear from assembled genome sequences. Having found a surprising lack of observable phenotypic deviations and non-Mendelian segregation in Arabidopsis thaliana mutants in SEC10, a gene encoding a core subunit of the exocyst tethering complex, we examined whether this could be explained by a hidden gene duplication. Re-sequencing and manual assembly of the Arabidopsis thaliana SEC10 (At5g12370) locus revealed that this locus, comprising a single gene in the reference genome assembly, indeed contains two paralogous genes in tandem, SEC10a and SEC10b, and that a sequence segment of 7 kb in length is missing from the reference genome sequence. Differences between the two paralogs are concentrated in non-coding regions, while the predicted protein sequences exhibit 99% identity, differing only by substitution of five amino acid residues and an indel of four residues. Both SEC10 genes are expressed, although varying transcript levels suggest differential regulation. Homozygous T-DNA insertion mutants in either paralog exhibit a wild-type phenotype, consistent with proposed extensive functional redundancy of the two genes. By these observations we demonstrate that recently duplicated genes may remain hidden even in well-characterized genomes, such as that of A. thaliana. Moreover, we show that the use of the existing A. thaliana reference genome sequence as a guide for sequence assembly of new Arabidopsis accessions or related species has at least in some cases led to error propagation.


July 7, 2019

LoRDEC: accurate and efficient long read error correction.

PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec. © The Author 2014. Published by Oxford University Press.


July 7, 2019

Complete genome of the switchgrass endophyte Enterobacter clocace P101.

The Enterobacter cloacae complex is genetically very diverse. The increasing number of complete genomic sequences of E. cloacae is helping to determine the exact relationship among members of the complex. E. cloacae P101 is an endophyte of switchgrass (Panicum virgatum) and is closely related to other E. cloacae strains isolated from plants. The P101 genome consists of a 5,369,929 bp chromosome. The chromosome has 5,164 protein-coding regions, 100 tRNA sequences, and 8 rRNA operons.


July 7, 2019

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information.

The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data.Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes.The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.


July 7, 2019

Genome sequences of two carbapenemase-resistant Klebsiella pneumoniae ST258 isolates.

Klebsiella pneumoniae, an ESKAPE group (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogen, has acquired multiple antibiotic resistance genes and is becoming a serious public health threat. Here, we report the genome sequences of two representative strains of K. pneumoniae from the emerging K. pneumoniae carbapenemase (KPC) outbreak in northeast Ohio belonging to sequence type 258 (ST258) (isolates Kb140 and Kb677, which were isolated from blood and urine, respectively). Both isolates harbor a blaKPC gene, and strain Kb140 carries blaKPC-2, while Kb677 carries blaKPC-3. Copyright © 2014 Ramirez et al.


July 7, 2019

First complete genome sequence of Salmonella enterica subsp. enterica serovar Typhimurium strain ATCC 13311 (NCTC 74), a reference strain of multidrug resistance, as achieved by use of PacBio Single-Molecule Real-Time technology.

We report the first complete genomic sequence of Salmonella enterica subsp. enterica serovar Typhimurium strain ATCC 13311, the leading food-borne pathogen and a reference strain used in drug resistance studies. De novo assembly with PacBio sequencing completed its chromosome and one plasmid. They will accelerate the investigation into multidrug resistance in Salmonella Typhimurium. Copyright © 2014 Terabayashi et al.


July 7, 2019

Enterobacter asburiae strain L1: complete genome and whole genome optical mapping analysis of a quorum sensing bacterium.

Enterobacter asburiae L1 is a quorum sensing bacterium isolated from lettuce leaves. In this study, for the first time, the complete genome of E. asburiae L1 was sequenced using the single molecule real time sequencer (PacBio RSII) and the whole genome sequence was verified by using optical genome mapping (OpGen) technology. In our previous study, E. asburiae L1 has been reported to produce AHLs, suggesting the possibility of virulence factor regulation which is quorum sensing dependent. This evoked our interest to study the genome of this bacterium and here we present the complete genome of E. asburiae L1, which carries the virulence factor gene virK, the N-acyl homoserine lactone-based QS transcriptional regulator gene luxR and the N-acyl homoserine lactone synthase gene which we firstly named easI. The availability of the whole genome sequence of E. asburiae L1 will pave the way for the study of the QS-mediated gene expression in this bacterium. Hence, the importance and functions of these signaling molecules can be further studied in the hope of elucidating the mechanisms of QS-regulation in E. asburiae. To the best of our knowledge, this is the first documentation of both a complete genome sequence and the establishment of the molecular basis of QS properties of E. asburiae.


July 7, 2019

Surveillance of carbapenem-resistant Klebsiella pneumoniae: tracking molecular epidemiology and outcomes through a regional network.

Carbapenem resistance in Gram-negative bacteria is on the rise in the United States. A regional network was established to study microbiological and genetic determinants of clinical outcomes in hospitalized patients with carbapenem-resistant (CR) Klebsiella pneumoniae in a prospective, multicenter, observational study. To this end, predefined clinical characteristics and outcomes were recorded and K. pneumoniae isolates were analyzed for strain typing and resistance mechanism determination. In a 14-month period, 251 patients were included. While most of the patients were admitted from long-term care settings, 28% of them were admitted from home. Hospitalizations were prolonged and complicated. Nonsusceptibility to colistin and tigecycline occurred in isolates from 7 and 45% of the patients, respectively. Most of the CR K. pneumoniae isolates belonged to repetitive extragenic palindromic PCR (rep-PCR) types A and B (both sequence type 258) and carried either blaKPC-2 (48%) or blaKPC-3 (51%). One isolate tested positive for blaNDM-1, a sentinel discovery in this region. Important differences between strain types were noted; rep-PCR type B strains were associated with blaKPC-3 (odds ratio [OR], 294; 95% confidence interval [CI], 58 to 2,552; P < 0.001), gentamicin nonsusceptibility (OR, 24; 95% CI, 8.39 to 79.38; P < 0.001), amikacin susceptibility (OR, 11.0; 95% CI, 3.21 to 42.42; P < 0.001), tigecycline nonsusceptibility (OR, 5.34; 95% CI, 1.30 to 36.41; P = 0.018), a shorter length of stay (OR, 0.98; 95% CI, 0.95 to 1.00; P = 0.043), and admission from a skilled-nursing facility (OR, 3.09; 95% CI, 1.26 to 8.08; P = 0.013). Our analysis shows that (i) CR K. pneumoniae is seen primarily in the elderly long-term care population and that (ii) regional monitoring of CR K. pneumoniae reveals insights into molecular characteristics. This work highlights the crucial role of ongoing surveillance of carbapenem resistance determinants. Copyright © 2014, American Society for Microbiology. All Rights Reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.