with the PacBio Archives - Page 4 of 11

July 7, 2019

Genome sequence of Kosakonia radicincitans strain YD4, a plant growth-promoting rhizobacterium isolated from yerba mate (Ilex paraguariensis St. Hill.).

Kosakonia radicincitans strain YD4 is a rhizospheric isolate from yerba mate (Ilex paraguariensis St. Hill.) with plant growth-promoting effects on this crop. Genes involved in different plant growth-promoting activities are present in this genome, suggesting its potential as a bioinoculant for yerba mate. Copyright © 2015 Bergottini et al.

July 7, 2019

The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb.

Dendrobium officinale Kimura et Migo is a traditional Chinese orchid herb that has both ornamental value and a broad range of therapeutic effects. Here, we report the first de novo assembled 1.35 Gb genome sequences for D. officinale by combining the second-generation Illumina Hiseq 2000 and third-generation PacBio sequencing technologies. We found that orchids have a complete inflorescence gene set and have some specific inflorescence genes. We observed gene expansion in gene families related to fungus symbiosis and drought resistance. We analyzed biosynthesis pathways of medicinal components of D. officinale and found extensive duplication of SPS and SuSy genes, which are related to polysaccharide generation, and that the pathway of D. officinale alkaloid synthesis could be extended to generate 16-epivellosimine. The D. officinale genome assembly demonstrates a new approach to deciphering large complex genomes and, as an important orchid species and a traditional Chinese medicine, the D. officinale genome will facilitate future research on the evolution of orchid plants, as well as the study of medicinal components and potential genetic breeding of the dendrobe. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.

July 7, 2019

Complete genome sequences of a clinical isolate and an environmental isolate of Vibrio parahaemolyticus.

Vibrio parahaemolyticus is the leading cause of seafood-borne infections in the United States. We report complete genome sequences for two V. parahaemolyticus strains isolated in 2007, CDC_K4557 and FDA_R31 of clinical and oyster origin, respectively. These two sequences might assist in the investigation of differential virulence of this organism. Copyright © 2015 Lüdeke et al.

July 7, 2019

Complete genome sequence of a carbapenem-resistant extraintestinal pathogenic Escherichia coli strain belonging to the sequence type 131 H30R subclade.

Here, we report the completed genome sequence of a carbapenem-resistant extraintestinal pathogenic Escherichia coli sequence type 131 (ST131) isolate, MNCRE44. The isolate was obtained in 2012 in Minnesota, USA, from a sputum sample from a hospitalized patient with multiple comorbidities, and it belongs to the H30R sublineage. Copyright © 2015 Johnson et al.

July 7, 2019

Genome sequence of the Drosophila melanogaster male-killing Spiroplasma strain MSRO endosymbiont.

Spiroplasmas are helical and motile members of a cell wall-less eubacterial group called Mollicutes. Although all spiroplasmas are associated with arthropods, they exhibit great diversity with respect to both their modes of transmission and their effects on their hosts; ranging from horizontally transmitted pathogens and commensals to endosymbionts that are transmitted transovarially (i.e., from mother to offspring). Here we provide the first genome sequence, along with proteomic validation, of an endosymbiotic inherited Spiroplasma bacterium, the Spiroplasma poulsonii MSRO strain harbored by Drosophila melanogaster. Comparison of the genome content of S. poulsonii with that of horizontally transmitted spiroplasmas indicates that S. poulsonii has lost many metabolic pathways and transporters, demonstrating a high level of interdependence with its insect host. Consistent with genome analysis, experimental studies showed that S. poulsonii metabolizes glucose but not trehalose. Notably, trehalose is more abundant than glucose in Drosophila hemolymph, and the inability to metabolize trehalose may prevent S. poulsonii from overproliferating. Our study identifies putative virulence genes, notably, those for a chitinase, the H2O2-producing glycerol-3-phosphate oxidase, and enzymes involved in the synthesis of the eukaryote-toxic lipid cardiolipin. S. poulsonii also expresses on the cell membrane one functional adhesion-related protein and two divergent spiralin proteins that have been implicated in insect cell invasion in other spiroplasmas. These lipoproteins may be involved in the colonization of the Drosophila germ line, ensuring S. poulsonii vertical transmission. The S. poulsonii genome is a valuable resource to explore the mechanisms of male killing and symbiont-mediated protection, two cardinal features of many facultative endosymbionts.Most insect species, including important disease vectors and crop pests, harbor vertically transmitted endosymbiotic bacteria. These endosymbionts play key roles in their hosts’ fitness, including protecting them against natural enemies and manipulating their reproduction in ways that increase the frequency of symbiont infection. Little is known about the molecular mechanisms that underlie these processes. Here, we provide the first genome draft of a vertically transmitted male-killing Spiroplasma bacterium, the S. poulsonii MSRO strain harbored by D. melanogaster. Analysis of the S. poulsonii genome was complemented by proteomics and ex vivo metabolic experiments. Our results indicate that S. poulsonii has reduced metabolic capabilities and expresses divergent membrane lipoproteins and potential virulence factors that likely participate in Spiroplasma-host interactions. This work fills a gap in our knowledge of insect endosymbionts and provides tools with which to decipher the interaction between Spiroplasma bacteria and their well-characterized host D. melanogaster, which is emerging as a model of endosymbiosis. Copyright © 2015 Paredes et al.

July 7, 2019

Complete genome of Jeotgalibacillus malaysiensis D5(T) consisting of a chromosome and a circular megaplasmid.

Jeotgalibacillus spp. are halophilic bacteria within the family Planococcaceae. No genomes of Jeotgalibacillus spp. have been reported to date, and their metabolic pathways are unknown. How the bacteria survive in hypertonic conditions such as seawater is yet to be discovered. As only few studies have been conducted on Jeotgalibacillus spp., potential applications of these bacteria are unknown. Here, we present the complete genome of J. malaysiensis D5(T) (=DSM 28777(T) =KCTC 33350(T)), which is invaluable in identifying interesting applications for this genus. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of Streptococcus thermophilus SMQ-301, a model strain for phage-host interactions.

Streptococcus thermophilus is used by the dairy industry to manufacture yogurt and several cheeses. Using PacBio and Illumina platforms, we sequenced the genome of S. thermophilus SMQ-301, the host of several virulent phages. The genome is composed of 1,861,792 bp and contains 2,037 genes, 67 tRNAs, and 18 rRNAs. Copyright © 2015 Labrie et al.

July 7, 2019

The Streptomyces leeuwenhoekii genome: de novo sequencing and assembly in single contigs of the chromosome, circular plasmid pSLE1 and linear plasmid pSLE2.

Next Generation DNA Sequencing (NGS) and genome mining of actinomycetes and other microorganisms is currently one of the most promising strategies for the discovery of novel bioactive natural products, potentially revealing novel chemistry and enzymology involved in their biosynthesis. This approach also allows rapid insights into the biosynthetic potential of microorganisms isolated from unexploited habitats and ecosystems, which in many cases may prove difficult to culture and manipulate in the laboratory. Streptomyces leeuwenhoekii (formerly Streptomyces sp. strain C34) was isolated from the hyper-arid high-altitude Atacama Desert in Chile and shown to produce novel polyketide antibiotics.Here we present the de novo sequencing of the S. leeuwenhoekii linear chromosome (8 Mb) and two extrachromosomal replicons, the circular pSLE1 (86 kb) and the linear pSLE2 (132 kb), all in single contigs, obtained by combining Pacific Biosciences SMRT (PacBio) and Illumina MiSeq technologies. We identified the biosynthetic gene clusters for chaxamycin, chaxalactin, hygromycin A and desferrioxamine E, metabolites all previously shown to be produced by this strain (J Nat Prod, 2011, 74:1965) and an additional 31 putative gene clusters for specialised metabolites. As well as gene clusters for polyketides and non-ribosomal peptides, we also identified three gene clusters encoding novel lasso-peptides.The S. leeuwenhoekii genome contains 35 gene clusters apparently encoding the biosynthesis of specialised metabolites, most of them completely novel and uncharacterised. This project has served to evaluate the current state of NGS for efficient and effective genome mining of high GC actinomycetes. The PacBio technology now permits the assembly of actinomycete replicons into single contigs with >99 % accuracy. The assembled Illumina sequence permitted not only the correction of omissions found in GC homopolymers in the PacBio assembly (exacerbated by the high GC content of actinomycete DNA) but it also allowed us to obtain the sequences of the termini of the chromosome and of a linear plasmid that were not assembled by PacBio. We propose an experimental pipeline that uses the Illumina assembled contigs, in addition to just the reads, to complement the current limitations of the PacBio sequencing technology and assembly software.

July 7, 2019

Complete genome sequence of Bacillus thuringiensis serovar tolworthi strain Pasteur Institute Standard.

The genome sequence of Bacillus thuringiensis serovar tolworthi strain Pasteur Institute Standard was determined. The genome consists of a 5.9-Mb chromosome and eight plasmids, one of which is linear. The second largest plasmid (293 kb) carries the genes encoding insecticidal proteins. Copyright © 2015 Kanda et al.

July 7, 2019

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites.We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e. error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at https://sourceforge.net/projects/gmcloser/.shunichi.kosugi@riken.jpSupplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Hybrid de novo tandem repeat detection using short and long reads.

As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.

July 7, 2019

Finished annotated genome sequence of Burkholderia pseudomallei strain Bp1651, a multidrug-resistant clinical isolate.

Burkholderia pseudomallei strain Bp1651, a human isolate, is resistant to all clinically relevant antibiotics. We report here on the finished genome sequence assembly and annotation of the two chromosomes of this strain. This genome sequence may assist in understanding the mechanisms of antimicrobial resistance for this pathogenic species. Copyright © 2015 Bugrysheva et al.

July 7, 2019

Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches.

Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion.

July 7, 2019

High-coverage sequencing and annotated assemblies of the budgerigar genome.

Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome.We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) — the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing.Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.

July 7, 2019

Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences.

To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences.Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies.All assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.brownsd@ornl.govSupplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

Auto Tag: with the PacBio

Genome sequence of Kosakonia radicincitans strain YD4, a plant growth-promoting rhizobacterium isolated from yerba mate (Ilex paraguariensis St. Hill.).

The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb.

Complete genome sequences of a clinical isolate and an environmental isolate of Vibrio parahaemolyticus.

Complete genome sequence of a carbapenem-resistant extraintestinal pathogenic Escherichia coli strain belonging to the sequence type 131 H30R subclade.

Genome sequence of the Drosophila melanogaster male-killing Spiroplasma strain MSRO endosymbiont.

Complete genome of Jeotgalibacillus malaysiensis D5(T) consisting of a chromosome and a circular megaplasmid.

Complete genome sequence of Streptococcus thermophilus SMQ-301, a model strain for phage-host interactions.

The Streptomyces leeuwenhoekii genome: de novo sequencing and assembly in single contigs of the chromosome, circular plasmid pSLE1 and linear plasmid pSLE2.

Complete genome sequence of Bacillus thuringiensis serovar tolworthi strain Pasteur Institute Standard.

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Hybrid de novo tandem repeat detection using short and long reads.

Finished annotated genome sequence of Burkholderia pseudomallei strain Bp1651, a multidrug-resistant clinical isolate.

Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches.

High-coverage sequencing and annotated assemblies of the budgerigar genome.

Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert