Pacbio reads Archives - Page 26 of 53

July 7, 2019

Genome sequence of Candidatus Nitrososphaera evergladensis from group I.1b enriched from Everglades soil reveals novel genomic features of the ammonia-oxidizing archaea.

The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group.

July 7, 2019

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects.Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Complete genome sequence of Pelosinus sp. strain UFO1 assembled using Single-Molecule Real-Time DNA Sequencing technology.

Pelosinus species can reduce metals such as Fe(III), U(VI), and Cr(VI) and have been isolated from diverse geographical regions. Five draft genome sequences have been published. We report the complete genome sequence for Pelosinus sp. strain UFO1 using only PacBio DNA sequence data and without manual finishing. Copyright © 2014 Brown et al.

July 7, 2019

Characterization of biological pathways associated with a 1.37 Mbp genomic region protective of hypertension in Dahl S rats.

The goal of the present study was to narrow a region of chromosome 13 to only several genes and then apply unbiased statistical approaches to identify molecular networks and biological pathways relevant to blood-pressure salt sensitivity in Dahl salt-sensitive (SS) rats. The analysis of 13 overlapping subcongenic strains identified a 1.37 Mbp region on chromosome 13 that influenced the mean arterial blood pressure by at least 25 mmHg in SS rats fed a high-salt diet. DNA sequencing and analysis filled genomic gaps and provided identification of five genes in this region, Rfwd2, Fam5b, Astn1, Pappa2, and Tnr. A cross-platform normalization of transcriptome data sets obtained from our previously published Affymetrix GeneChip dataset and newly acquired RNA-seq data from renal outer medullary tissue provided 90 observations for each gene. Two Bayesian methods were used to analyze the data: 1) a linear model analysis to assess 243 biological pathways for their likelihood to discriminate blood pressure levels across experimental groups and 2) a Bayesian graphical modeling of pathways to discover genes with potential relationships to the candidate genes in this region. As none of these five genes are known to be involved in hypertension, this unbiased approach has provided useful clues to be experimentally explored. Of these five genes, Rfwd2, the gene most strongly expressed in the renal outer medulla, was notably associated with pathways that can affect blood pressure via renal transcellular Na(+) and K(+) electrochemical gradients and tubular Na(+) transport, mitochondrial TCA cycle and cell energetics, and circadian rhythms. Copyright © 2014 the American Physiological Society.

July 7, 2019

The complete genome sequence of Clostridium indolis DSM 755(T.).

Clostridium indolis DSM 755(T) is a bacterium commonly found in soils and the feces of birds and mammals. Despite its prevalence, little is known about the ecology or physiology of this species. However, close relatives, C. saccharolyticum and C. hathewayi, have demonstrated interesting metabolic potentials related to plant degradation and human health. The genome of C. indolis DSM 755(T) reveals an abundance of genes in functional groups associated with the transport and utilization of carbohydrates, as well as citrate, lactate, and aromatics. Ecologically relevant gene clusters related to nitrogen fixation and a unique type of bacterial microcompartment, the CoAT BMC, are also detected. Our genome analysis suggests hypotheses to be tested in future culture based work to better understand the physiology of this poorly described species.

July 7, 2019

LUMPY: a probabilistic framework for structural variant discovery.

Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.

July 7, 2019

Association mapping, patterns of linkage disequilibrium and selection in the vicinity of the PHYTOCHROME C gene in pearl millet.

Linkage analysis confirmed the association in the region of PHYC in pearl millet. The comparison of genes found in this region suggests that PHYC is the best candidate. Major efforts are currently underway to dissect the phenotype-genotype relationship in plants and animals using existing populations. This method exploits historical recombinations accumulated in these populations. However, linkage disequilibrium sometimes extends over a relatively long distance, particularly in genomic regions containing polymorphisms that have been targets for selection. In this case, many genes in the region could be statistically associated with the trait shaped by the selected polymorphism. Statistical analyses could help in identifying the best candidate genes into such a region where an association is found. In a previous study, we proposed that a fragment of the PHYTOCHROME C gene (PHYC) is associated with flowering time and morphological variations in pearl millet. In the present study, we first performed linkage analyses using three pearl millet F2 families to confirm the presence of a QTL in the vicinity of PHYC. We then analyzed a wider genomic region of ~100 kb around PHYC to pinpoint the gene that best explains the association with the trait in this region. A panel of 90 pearl millet inbred lines was used to assess the association. We used a Markov chain Monte Carlo approach to compare 75 markers distributed along this 100-kb region. We found the best candidate markers on the PHYC gene. Signatures of selection in this region were assessed in an independent data set and pointed to the same gene. These results foster confidence in the likely role of PHYC in phenotypic variation and encourage the development of functional studies.

July 7, 2019

Safety of the surrogate microorganism Enterococcus faecium NRRL B-2354 for use in thermal process validation.

Enterococcus faecium NRRL B-2354 is a surrogate microorganism used in place of pathogens for validation of thermal processing technologies and systems. We evaluated the safety of strain NRRL B-2354 based on its genomic and functional characteristics. The genome of E. faecium NRRL B-2354 was sequenced and found to comprise a 2,635,572-bp chromosome and a 214,319-bp megaplasmid. A total of 2,639 coding sequences were identified, including 45 genes unique to this strain. Hierarchical clustering of the NRRL B-2354 genome with 126 other E. faecium genomes as well as pbp5 locus comparisons and multilocus sequence typing (MLST) showed that the genotype of this strain is most similar to commensal, or community-associated, strains of this species. E. faecium NRRL B-2354 lacks antibiotic resistance genes, and both NRRL B-2354 and its clonal relative ATCC 8459 are sensitive to clinically relevant antibiotics. This organism also lacks, or contains nonfunctional copies of, enterococcal virulence genes including acm, cyl, the ebp operon, esp, gelE, hyl, IS16, and associated phenotypes. It does contain scm, sagA, efaA, and pilA, although either these genes were not expressed or their roles in enterococcal virulence are not well understood. Compared with the clinical strains TX0082 and 1,231,502, E. faecium NRRL B-2354 was more resistant to acidic conditions (pH 2.4) and high temperatures (60°C) and was able to grow in 8% ethanol. These findings support the continued use of E. faecium NRRL B-2354 in thermal process validation of food products.

July 7, 2019

LoRDEC: accurate and efficient long read error correction.

PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec. © The Author 2014. Published by Oxford University Press.

July 7, 2019

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information.

The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data.Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes.The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.

July 7, 2019

A gapless, unambiguous genome sequence of the Enterohemorrhagic Escherichia coli O157:H7 strain EDL933.

Escherichia coli EDL933 is the prototypic strain for enterohemorrhagic E. coli serotype O157:H7, associated with deadly food-borne outbreaks. Because the publicly available sequence of the EDL933 genome has gaps and >6,000 ambiguous base calls, we here present an updated high-quality, unambiguous genome sequence with no assembly gaps. Copyright © 2014 Latif et al.

July 7, 2019

Detecting authorized and unauthorized genetically modified organisms containing vip3A by real-time PCR and next-generation sequencing.

The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.

July 7, 2019

Signature gene expression reveals novel clues to the molecular mechanisms of dimorphic transition in Penicillium marneffei.

Systemic dimorphic fungi cause more than one million new infections each year, ranking them among the significant public health challenges currently encountered. Penicillium marneffei is a systemic dimorphic fungus endemic to Southeast Asia. The temperature-dependent dimorphic phase transition between mycelium and yeast is considered crucial for the pathogenicity and transmission of P. marneffei, but the underlying mechanisms are still poorly understood. Here, we re-sequenced P. marneffei strain PM1 using multiple sequencing platforms and assembled the genome using hybrid genome assembly. We determined gene expression levels using RNA sequencing at the mycelial and yeast phases of P. marneffei, as well as during phase transition. We classified 2,718 genes with variable expression across conditions into 14 distinct groups, each marked by a signature expression pattern implicated at a certain stage in the dimorphic life cycle. Genes with the same expression patterns tend to be clustered together on the genome, suggesting orchestrated regulations of the transcriptional activities of neighboring genes. Using qRT-PCR, we validated expression levels of all genes in one of clusters highly expressed during the yeast-to-mycelium transition. These included madsA, a gene encoding MADS-box transcription factor whose gene family is exclusively expanded in P. marneffei. Over-expression of madsA drove P. marneffei to undergo mycelial growth at 37°C, a condition that restricts the wild-type in the yeast phase. Furthermore, analyses of signature expression patterns suggested diverse roles of secreted proteins at different developmental stages and the potential importance of non-coding RNAs in mycelium-to-yeast transition. We also showed that RNA structural transition in response to temperature changes may be related to the control of thermal dimorphism. Together, our findings have revealed multiple molecular mechanisms that may underlie the dimorphic transition in P. marneffei, providing a powerful foundation for identifying molecular targets for mechanism-based interventions.

July 7, 2019

Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi.

Background Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range.ResultsHere, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism.ConclusionsThe genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions.

July 7, 2019

The genomic landscape of the verrucomicrobial methanotroph Methylacidiphilum fumariolicum SolV.

Aerobic methanotrophs can grow in hostile volcanic environments and use methane as their sole source of energy. The discovery of three verrucomicrobial Methylacidiphilum strains has revealed diverse metabolic pathways used by these methanotrophs, including mechanisms through which methane is oxidized. The basis of a complete understanding of these processes and of how these bacteria evolved and are able to thrive in such extreme environments partially resides in the complete characterization of their genome and its architecture.In this study, we present the complete genome sequence of Methylacidiphilum fumariolicum SolV, obtained using Pacific Biosciences single-molecule real-time (SMRT) sequencing technology. The genome assembles to a single 2.5 Mbp chromosome with an average GC content of 41.5%. The genome contains 2,741 annotated genes and 314 functional subsystems including all key metabolic pathways that are associated with Methylacidiphilum strains, including the CBB pathway for CO2 fixation. However, it does not encode the serine cycle and ribulose monophosphate pathways for carbon fixation. Phylogenetic analysis of the particulate methane mono-oxygenase operon separates the Methylacidiphilum strains from other verrucomicrobial methanotrophs. RNA-Seq analysis of cell cultures growing in three different conditions revealed the deregulation of two out of three pmoCAB operons. In addition, genes involved in nitrogen fixation were upregulated in cell cultures growing in nitrogen fixing conditions, indicating the presence of active nitrogenase. Characterization of the global methylation state of M. fumariolicum SolV revealed methylation of adenines and cytosines mainly in the coding regions of the genome. Methylation of adenines was predominantly associated with 5′-m6ACN4GT-3′ and 5′-CCm6AN5CTC-3′ methyltransferase recognition motifs whereas methylated cytosines were not associated with any specific motif.Our findings provide novel insights into the global methylation state of verrucomicrobial methanotroph M. fumariolicum SolV. However, partial conservation of methyltransferases between M. fumariolicum SolV and M. infernorum V4 indicates potential differences in the global methylation state of Methylacidiphilum strains. Unravelling the M. fumariolicum SolV genome and its epigenetic regulation allow for robust characterization of biological processes that are involved in oxidizing methane. In turn, they offer a better understanding of the evolution, the underlying physiological and ecological properties of SolV and other Methylacidiphilum strains.

Auto Tag: Pacbio reads

Genome sequence of Candidatus Nitrososphaera evergladensis from group I.1b enriched from Everglades soil reveals novel genomic features of the ammonia-oxidizing archaea.

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Complete genome sequence of Pelosinus sp. strain UFO1 assembled using Single-Molecule Real-Time DNA Sequencing technology.

Characterization of biological pathways associated with a 1.37 Mbp genomic region protective of hypertension in Dahl S rats.

The complete genome sequence of Clostridium indolis DSM 755(T.).

LUMPY: a probabilistic framework for structural variant discovery.

Association mapping, patterns of linkage disequilibrium and selection in the vicinity of the PHYTOCHROME C gene in pearl millet.

Safety of the surrogate microorganism Enterococcus faecium NRRL B-2354 for use in thermal process validation.

LoRDEC: accurate and efficient long read error correction.

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information.

A gapless, unambiguous genome sequence of the Enterohemorrhagic Escherichia coli O157:H7 strain EDL933.

Detecting authorized and unauthorized genetically modified organisms containing vip3A by real-time PCR and next-generation sequencing.

Signature gene expression reveals novel clues to the molecular mechanisms of dimorphic transition in Penicillium marneffei.

Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi.

The genomic landscape of the verrucomicrobial methanotroph Methylacidiphilum fumariolicum SolV.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert