Pacbio reads Archives - Page 24 of 53

July 7, 2019

A synteny-based draft genome sequence of the forage grass Lolium perenne.

Here we report the draft genome sequence of perennial ryegrass (Lolium perenne), an economically important forage and turf grass species that is widely cultivated in temperate regions worldwide. It is classified along with wheat, barley, oats and Brachypodium distachyon in the Pooideae sub-family of the grass family (Poaceae). Transcriptome data was used to identify 28 455 gene models, and we utilized macro-co-linearity between perennial ryegrass and barley, and synteny within the grass family, to establish a synteny-based linear gene order. The gametophytic self-incompatibility mechanism enables the pistil of a plant to reject self-pollen and therefore promote out-crossing. We have used the sequence assembly to characterize transcriptional changes in the stigma during pollination with both compatible and incompatible pollen. Characterization of the pollen transcriptome identified homologs to pollen allergens from a range of species, many of which were expressed to very high levels in mature pollen grains, and are potentially involved in the self-incompatibility mechanism. The genome sequence provides a valuable resource for future breeding efforts based on genomic prediction, and will accelerate the development of new varieties for more productive grasslands.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

July 7, 2019

PAFFT: A new homology search algorithm for third-generation sequencers.

DNA sequencers that can conduct real-time sequencing from a single polymerase molecule are known as third-generation sequencers. Third-generation sequencers enable sequencing of reads that are several kilobases long. However, the raw data generated from third-generation sequencers are known to be error-prone. Because of sequencing errors, it is difficult to identify which genes are homologous to the reads obtained using third-generation sequencers. In this study, a new method for homology search algorithm, PAFFT, is developed. This method is the extension of the MAFFT algorithm which was used for multiple alignments. PAFFT detects global homology rather than local homology so that homologous regions can be detected even when the error rate of sequencing is high. PAFFT will boost application of third-generation sequencers. Copyright © 2015 Elsevier Inc. All rights reserved.

July 7, 2019

Complete genome of the marine bacterium Wenzhouxiangella marina KCTC 42284(T).

Wenzhouxiangella marina is an obligatory aerobic, Gram-negative, non-motile, rod-shaped bacterium that was isolated from the culture broth of marine microalgae, Picochlorum sp. 122. Here we report the 3.67 MB complete genome (65.26 G+C%) of W. marina KCTC 42284(T) encoding 3,016 protein-coding genes, 43 tRNAs and one rRNA operon. The genomic information supports multiple horizontal gene transfer (HGT) events in the history of W. marina, possibly with other marine bacteria co-existing in marine habitats. Evaluation of genomic signatures revealed 19 such HGT-derived genomic islands. Of these, eight were also supported by “genomic context” that refers to the existence of integrases, transposases and tmRNA genes either inside or in near vicinity to the island. The addition of W. marina genome expands the repertoire of marine bacterial genomic diversity, especially because the strain represents the sole genomic resource of a novel taxonomic family in the bacterial order Chromatiales. Copyright © 2015 Elsevier B.V. All rights reserved.

July 7, 2019

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.

July 7, 2019

The genome and methylome of a beetle with complex social behavior, Nicrophorus vespilloides (Coleoptera: Silphidae).

Testing for conserved and novel mechanisms underlying phenotypic evolution requires a diversity of genomes available for comparison spanning multiple independent lineages. For example, complex social behavior in insects has been investigated primarily with eusocial lineages, nearly all of which are Hymenoptera. If conserved genomic influences on sociality do exist, we need data from a wider range of taxa that also vary in their levels of sociality. Here, we present the assembled and annotated genome of the subsocial beetle Nicrophorus vespilloides, a species long used to investigate evolutionary questions of complex social behavior. We used this genome to address two questions. First, do aspects of life history, such as using a carcass to breed, predict overlap in gene models more strongly than phylogeny? We found that the overlap in gene models was similar between N. vespilloides and all other insect groups regardless of life history. Second, like other insects with highly developed social behavior but unlike other beetles, does N. vespilloides have DNA methylation? We found strong evidence for an active DNA methylation system. The distribution of methylation was similar to other insects with exons having the most methylated CpGs. Methylation status appears highly conserved; 85% of the methylated genes in N. vespilloides are also methylated in the hymentopteran Nasonia vitripennis. The addition of this genome adds a coleopteran resource to answer questions about the evolution and mechanistic basis of sociality and to address questions about the potential role of methylation in social behavior. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

Draft genome sequence of Pasteurella multocida isolate P1062, isolated from bovine respiratory disease.

July 7, 2019

High quality draft genomes of the Mycoplasma mycoides subsp. mycoides challenge strains Afadé and B237.

Members of the Mycoplasma mycoides cluster’ represent important livestock pathogens worldwide. Mycoplasma mycoides subsp. mycoides is the etiologic agent of contagious bovine pleuropneumonia (CBPP), which is still endemic in many parts of Africa. We report the genome sequences and annotation of two frequently used challenge strains of Mycoplasma mycoides subsp. mycoides, Afadé and B237. The information provided will enable downstream ‘omics’ applications such as proteomics, transcriptomics and reverse vaccinology approaches. Despite the absence of Mycoplasma pneumoniae like cyto-adhesion encoding genes, the two strains showed the presence of protrusions. This phenotype is likely encoded by another set of genes.

July 7, 2019

svviz: a read viewer for validating structural variants.

Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms.svviz is implemented in python and freely available from http://svviz.github.io/. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

July 7, 2019

Complete genome sequence of Novosphingobium pentaromativorans US6-1(T).

Novosphingobium pentaromativorans US6-1(T) is a species in the family Sphingomonadaceae. According to the phylogenetic analysis based on 16S rRNA gene sequence of the N. pentaromativorans US6-1(T) and nine genome-sequenced strains in the genus Novosphingobium, the similarity ranged from 93.9 to 99.9 % and the highest similarity was found with Novosphingobium sp. PP1Y (99.9 %), whereas the ANI value based on genomes ranged from 70.9 to 93 % and the highest value was 93 %. This microorganism was isolated from muddy coastal bay sediments where the environment is heavily polluted by polycyclic aromatic hydrocarbons (PAHs). It was previously shown to be capable of degrading multiple PAHs, including benzo[a]pyrene. To further understand the PAH biodegradation pathways the previous draft genome of this microorganism was revised to obtain a complete genome using Illumina MiSeq and PacBio platform. The genome of strain US6-1(T) consists of 5,457,578 bp, which includes the 3,979,506 bp chromosome and five megaplasmids. It comprises 5110 protein-coding genes and 82 RNA genes. Here, we provide an analysis of the complete genome sequence which enables the identification of new characteristics of this strain.

July 7, 2019

Genome sequence and description of the anaerobic lignin-degrading bacterium Tolumonas lignolytica sp. nov.

Tolumonas lignolytica BRL6-1(T) sp. nov. is the type strain of T. lignolytica sp. nov., a proposed novel species of the Tolumonas genus. This strain was isolated from tropical rainforest soils based on its ability to utilize lignin as a sole carbon source. Cells of Tolumonas lignolytica BRL6-1(T) are mesophilic, non-spore forming, Gram-negative rods that are oxidase and catalase negative. The genome for this isolate was sequenced and returned in seven unique contigs totaling 3.6Mbp, enabling the characterization of several putative pathways for lignin breakdown. Particularly, we found an extracellular peroxidase involved in lignin depolymerization, as well as several enzymes involved in ß-aryl ether bond cleavage, which is the most abundant linkage between lignin monomers. We also found genes for enzymes involved in ferulic acid metabolism, which is a common product of lignin breakdown. By characterizing pathways and enzymes employed in the bacterial breakdown of lignin in anaerobic environments, this work should assist in the efficient engineering of biofuel production from lignocellulosic material.

July 7, 2019

De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping.

It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome.In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work.We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.

July 7, 2019

Wham: Identifying structural variants of biological consequence.

Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools-Lumpy, Delly and SoftSearch-and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.

July 7, 2019

Complete genome sequence and characterization of the haloacid-degrading Burkholderia caribensis MBA4.

Burkholderia caribensis MBA4 was isolated from soil for its capability to grow on haloacids. This bacterium has a genome size of 9,482,704 bp. Here we report the genome sequences and annotation, together with characteristics of the genome. The complete genome sequence consists of three replicons, comprising 9056 protein-coding genes and 80 RNA genes. Genes responsible for dehalogenation and uptake of haloacids were arranged as an operon. While dehalogenation of haloacetate would produce glycolate, three glycolate operons were identified. Two of these operons contain an upstream glcC regulator gene. It is likely that the expression of one of these operons is responsive to haloacetate. Genes responsible for the metabolism of dehalogenation product of halopropionate were also identified.

July 7, 2019

Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches.

Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion.

July 7, 2019

Complete genome sequence of the molybdenum-resistant bacterium Bacillus subtilis strain LM 4-2.

Bacillus subtilis LM 4-2, a Gram-positive bacterium was isolated from a molybdenum mine in Luoyang city. Due to its strong resistance to molybdate and potential utilization in bioremediation of molybdate-polluted area, we describe the features of this organism, as well as its complete genome sequence and annotation. The genome was composed of a circular 4,069,266 bp chromosome with average GC content of 43.83 %, which included 4149 predicted ORFs and 116 RNA genes. Additionally, 687 transporter-coding and 116 redox protein-coding genes were identified in the strain LM 4-2 genome.

Auto Tag: Pacbio reads

A synteny-based draft genome sequence of the forage grass Lolium perenne.

PAFFT: A new homology search algorithm for third-generation sequencers.

Complete genome of the marine bacterium Wenzhouxiangella marina KCTC 42284(T).

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

The genome and methylome of a beetle with complex social behavior, Nicrophorus vespilloides (Coleoptera: Silphidae).

Draft genome sequence of Pasteurella multocida isolate P1062, isolated from bovine respiratory disease.

High quality draft genomes of the Mycoplasma mycoides subsp. mycoides challenge strains Afadé and B237.

svviz: a read viewer for validating structural variants.

Complete genome sequence of Novosphingobium pentaromativorans US6-1(T).

Genome sequence and description of the anaerobic lignin-degrading bacterium Tolumonas lignolytica sp. nov.

De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping.

Wham: Identifying structural variants of biological consequence.

Complete genome sequence and characterization of the haloacid-degrading Burkholderia caribensis MBA4.

Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches.

Complete genome sequence of the molybdenum-resistant bacterium Bacillus subtilis strain LM 4-2.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert