Menu
July 7, 2019  |  

Understanding the pathogenicity of Burkholderia contaminans, an emerging pathogen in cystic fibrosis.

Several bacterial species from the Burkholderia cepacia complex (Bcc) are feared opportunistic pathogens that lead to debilitating lung infections with a high risk of developing fatal septicemia in cystic fibrosis (CF) patients. However, the pathogenic potential of other Bcc species is yet unknown. To elucidate clinical relevance of Burkholderia contaminans, a species frequently isolated from CF respiratory samples in Ibero-American countries, we aimed to identify its key virulence factors possibly linked with an unfavorable clinical outcome. We performed a genome-wide comparative analysis of two isolates of B. contaminans ST872 from sputum and blood culture of a female CF patient in Argentina. RNA-seq data showed significant changes in expression for quorum sensing-regulated virulence factors and motility and chemotaxis. Furthermore, we detected expression changes in a recently described low-oxygen-activated (lxa) locus which encodes stress-related proteins, and for two clusters responsible for the biosynthesis of antifungal and hemolytic compounds pyrrolnitrin and occidiofungin. Based on phenotypic assays that confirmed changes in motility and in proteolytic, hemolytic and antifungal activities, we were able to distinguish two phenotypes of B. contaminans that coexisted in the host and entered her bloodstream. Whole genome sequencing revealed that the sputum and bloodstream isolates (each representing a distinct phenotype) differed by over 1,400 mutations as a result of a mismatch repair-deficient hypermutable state of the sputum isolate. The inferred lack of purifying selection against nonsynonymous mutations and the high rate of pseudogenization in the derived isolate indicated limited evolutionary pressure during evolution in the nutrient-rich, stable CF sputum environment. The present study is the first to examine the genomic and transcriptomic differences between longitudinal isolates of B. contaminans. Detected activity of a number of putative virulence factors implies a genuine pathogenic nature of this novel Bcc species.


July 7, 2019  |  

Genome sequence and analysis of the Japanese morning glory Ipomoea nil.

Ipomoea is the largest genus in the family Convolvulaceae. Ipomoea nil (Japanese morning glory) has been utilized as a model plant to study the genetic basis of floricultural traits, with over 1,500 mutant lines. In the present study, we have utilized second- and third-generation-sequencing platforms, and have reported a draft genome of I. nil with a scaffold N50 of 2.88?Mb (contig N50 of 1.87?Mb), covering 98% of the 750?Mb genome. Scaffolds covering 91.42% of the assembly are anchored to 15 pseudo-chromosomes. The draft genome has enabled the identification and cataloguing of the Tpn1 family transposons, known as the major mutagen of I. nil, and analysing the dwarf gene, CONTRACTED, located on the genetic map published in 1956. Comparative genomics has suggested that a whole genome duplication in Convolvulaceae, distinct from the recent Solanaceae event, has occurred after the divergence of the two sister families.


July 7, 2019  |  

The evolution of orphan regions in genomes of a fungal pathogen of wheat.

Fungal plant pathogens rapidly evolve virulence on resistant hosts through mutations in genes encoding proteins that modulate the host immune responses. The mutational spectrum likely includes chromosomal rearrangements responsible for gains or losses of entire genes. However, the mechanisms creating adaptive structural variation in fungal pathogen populations are poorly understood. We used complete genome assemblies to quantify structural variants segregating in the highly polymorphic fungal wheat pathogen Zymoseptoria tritici The genetic basis of virulence in Z. tritici is complex, and populations harbor significant genetic variation for virulence; hence, we aimed to identify whether structural variation led to functional differences. We combined single-molecule real-time sequencing, genetic maps, and transcriptomics data to generate a fully assembled and annotated genome of the highly virulent field isolate 3D7. Comparative genomics analyses against the complete reference genome IPO323 identified large chromosomal inversions and the complete gain or loss of transposable-element clusters, explaining the extensive chromosomal-length polymorphisms found in this species. Both the 3D7 and IPO323 genomes harbored long tracts of sequences exclusive to one of the two genomes. These orphan regions contained 296 genes unique to the 3D7 genome and not previously known for this species. These orphan genes tended to be organized in clusters and showed evidence of mutational decay. Moreover, the orphan genes were enriched in genes encoding putative effectors and included a gene that is one of the most upregulated putative effector genes during wheat infection. Our study showed that this pathogen species harbored extensive chromosomal structure polymorphism that may drive the evolution of virulence.Pathogen outbreak populations often harbor previously unknown genes conferring virulence. Hence, a key puzzle of rapid pathogen evolution is the origin of such evolutionary novelty in genomes. Chromosomal rearrangements and structural variation in pathogen populations likely play a key role. However, identifying such polymorphism is challenging, as most genome-sequencing approaches only yield information about point mutations. We combined long-read technology and genetic maps to assemble the complete genome of a strain of a highly polymorphic fungal pathogen of wheat. Comparisons against the reference genome of the species showed substantial variation in the chromosome structure and revealed large regions unique to each assembled genome. These regions were enriched in genes encoding likely effector proteins, which are important components of pathogenicity. Our study showed that pathogen populations harbor extensive polymorphism at the chromosome level and that this polymorphism can be a source of adaptive genetic variation in pathogen evolution. Copyright © 2016 Plissonneau et al.


July 7, 2019  |  

Complete genome of Vibrio parahaemolyticus FORC014 isolated from the toothfish.

Foodborne illness can occur due to various pathogenic bacteria such as Staphylococcus aureus, Escherichia coli and Vibrio parahaemolyticus, and can cause severe gastroenteritis symptoms. In this study, we completed the genome sequence of a foodborne pathogen V. parahaemolyticus FORC_014, which was isolated from suspected contaminated toothfish from South Korea. Additionally, we extended our knowledge of genomic characteristics of the FORC_014 strain through comparative analysis using the complete sequences of other V. parahaemolyticus strains whose complete genomes have previously been reported.The complete genome sequence of V. parahaemolyticus FORC_014 was generated using the PacBio RS platform with single molecule, real-time (SMRT) sequencing. The FORC_014 strain consists of two circular chromosomes (3,241,330 bp for chromosome 1 and 1,997,247 bp for chromosome 2), one plasmid (51,383 bp), and one putative phage sequence (96,896 bp). The genome contains a total of 4274 putative protein coding sequences, 126 tRNA genes and 34 rRNA genes. Furthermore, we found 33 type III secretion system 1 (T3SS1) related proteins and 15 type III secretion system 2 (T3SS2) related proteins on chromosome 1. This is the first reported result of Type III secretion system 2 located on chromosome 1 of V. parahaemolyticus without thermostable direct hemolysin (tdh) and thermostable direct hemolysin-related hemolysin (trh).Through investigation of the complete genome sequence of V. parahaemolyticus FORC_014, which differs from previously reported strains, we revealed two type III secretion systems (T3SS1, T3SS2) located on chromosome 1 which do not include tdh and trh genes. We also identified several virulence factors carried by our strain, including iron uptake system, hemolysin and secretion system. This result suggests that the FORC_014 strain may be one pathogen responsible for foodborne illness outbreak. Our results provide significant genomic clues which will assist in future understanding of virulence at the genomic level and help distinguish between clinical and non-clinical isolates.


July 7, 2019  |  

MICADo – Looking for mutations in targeted PacBio cancer data: an alignment-free method.

Targeted sequencing is commonly used in clinical application of NGS technology since it enables generation of sufficient sequencing depth in the targeted genes of interest and thus ensures the best possible downstream analysis. This notwithstanding, the accurate discovery and annotation of disease causing mutations remains a challenging problem even in such favorable context. The difficulty is particularly salient in the case of third generation sequencing technology, such as PacBio. We present MICADo, a de Bruijn graph based method, implemented in python, that makes possible to distinguish between patient specific mutations and other alterations for targeted sequencing of a cohort of patients. MICADo analyses NGS reads for each sample within the context of the data of the whole cohort in order to capture the differences between specificities of the sample with respect to the cohort. MICADo is particularly suitable for sequencing data from highly heterogeneous samples, especially when it involves high rates of non-uniform sequencing errors. It was validated on PacBio sequencing datasets from several cohorts of patients. The comparison with two widely used available tools, namely VarScan and GATK, shows that MICADo is more accurate, especially when true mutations have frequencies close to backgound noise. The source code is available at http://github.com/cbib/MICADo.


July 7, 2019  |  

Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.

Haplotypes are the units of inheritance in an organism, and many genetic analyses depend on their precise determination. Methods for haplotyping single individuals use the phasing information available in next-generation sequencing reads, by matching overlapping single-nucleotide polymorphisms while penalizing post hoc nucleotide corrections made. Haplotyping diploids is relatively easy, but the complexity of the problem increases drastically for polyploid genomes, which are found in both model organisms and in economically relevant plant and animal species. Although a number of tools are available for haplotyping polyploids, the effects of the genomic makeup and the sequencing strategy followed on the accuracy of these methods have hitherto not been thoroughly evaluated.We developed the simulation pipeline haplosim to evaluate the performance of three haplotype estimation algorithms for polyploids: HapCompass, HapTree and SDhaP, in settings varying in sequencing approach, ploidy levels and genomic diversity, using tetraploid potato as the model. Our results show that sequencing depth is the major determinant of haplotype estimation quality, that 1?kb PacBio circular consensus sequencing reads and Illumina reads with large insert-sizes are competitive and that all methods fail to produce good haplotypes when ploidy levels increase. Comparing the three methods, HapTree produces the most accurate estimates, but also consumes the most resources. There is clearly room for improvement in polyploid haplotyping algorithms.


July 7, 2019  |  

Collection and storage of HLA NGS genotyping data for the 17th International HLA and Immunogenetics Workshop.

For over 50?years, the International HLA and Immunogenetics Workshops (IHIW) have advanced the fields of histocompatibility and immunogenetics (H&I) via community sharing of technology, experience and reagents, and the establishment of ongoing collaborative projects. Held in the fall of 2017, the 17th IHIW focused on the application of next generation sequencing (NGS) technologies for clinical and research goals in the H&I fields. NGS technologies have the potential to allow dramatic insights and advances in these fields, but the scope and sheer quantity of data associated with NGS raise challenges for their analysis, collection, exchange and storage. The 17th IHIW adopted a centralized approach to these issues, and we developed the tools, services and systems to create an effective system for capturing and managing these NGS data. We worked with NGS platform and software developers to define a set of distinct but equivalent NGS typing reports that record NGS data in a uniform fashion. The 17th IHIW database applied our standards, tools and services to collect, validate and store those structured, multi-platform data in an automated fashion. We have created community resources to enable exploration of the vast store of curated sequence and allele-name data in the IPD-IMGT/HLA Database, with the goal of creating a long-term community resource that integrates these curated data with new NGS sequence and polymorphism data, for advanced analyses and applications. Copyright © 2017 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.


July 7, 2019  |  

RepLong: de novo repeat identification using long read sequencing data.

The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats.In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data.The software of RepLong is freely available at https://github.com/ruiguo-bio/replong.ywsun@szu.edu.cn or zhuzx@szu.edu.cn.Supplementary data are available at Bioinformatics online.


July 7, 2019  |  

Genome sequence of Galleria mellonella(greater wax moth).

The larvae of the greater wax moth,Galleria mellonella, are pests of active beehives. In infection biology, these larvae are playing a more and more attractive role as an invertebrate host model. Here, we report on the first genome sequence ofGalleria mellonella. Copyright © 2018 Lange et al.


July 7, 2019  |  

The ‘gifted’ actinomycete Streptomyces leeuwenhoekii.

Streptomyces leeuwenhoekii strains C34T, C38, C58 and C79 were isolated from a soil sample collected from the Chaxa Lagoon, located in the Salar de Atacama in northern Chile. These streptomycetes produce a variety of new specialised metabolites with antibiotic, anti-cancer and anti-inflammatory activities. Moreover, genome mining performed on two of these strains has revealed the presence of biosynthetic gene clusters with the potential to produce new specialised metabolites. This review focusses on this new clade of Streptomyces strains, summarises the literature and presents new information on strain C34T.


July 7, 2019  |  

Complete genome sequence of Tsukamurella sp. MH1: A wide-chain length alkane-degrading actinomycete.

Tsukamurella sp. strain MH1, capable to use a wide range of n-alkanes as the only carbon source, was isolated from petroleum-contaminated soil (Pite?ti, Romania) and its complete genome was sequenced. The 4,922,396?bp genome contains only one circular chromosome with a G?+?C content of 71.12%, much higher than the type strains of this genus (68.4%). Based on the 16S rRNA genes sequence similarity, strain MH1 was taxonomically identified as Tsukamurella carboxydivorans. Genome analyses revealed that strain MH1 is harboring only one gene encoding for the alkB-like hydroxylase, arranged in a complete alkane monooxygenase operon. This is the first complete genome of the specie T. carboxydivorans, which will provide insights into the potential of Tsukamurella sp. MH1 and related strains for bioremediation of petroleum hydrocarbons-contaminated sites and into the environmental role of these bacteria. Copyright © 2017. Published by Elsevier B.V.


July 7, 2019  |  

Draft genome assembly of the sheep scab mite, Psoroptes ovis.

Sheep scab, caused by infestation with Psoroptes ovis, is highly contagious, results in intense pruritus, and represents a major welfare and economic concern. Here, we report the first draft genome assembly and gene prediction of P. ovis based on PacBio de novo sequencing. The ~63.2-Mb genome encodes 12,041 protein-coding genes. Copyright © 2018 Burgess et al.


July 7, 2019  |  

The sequenced angiosperm genomes and genome databases.

Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.


July 7, 2019  |  

Darwin: A genomics co-processor provides up to 15,000 X acceleration on long read assembly

of life in fundamental ways. Genomics data, however, is far outpacing Moore’s Law. Third-generation sequencing tech- nologies produce 100× longer reads than second generation technologies and reveal a much broader mutation spectrum of disease and evolution. However, these technologies incur prohibitively high computational costs. Over 1,300 CPU hours are required for reference-guided assembly of the human genome (using [47]), and over 15,600 CPU hours are required for de novo assembly [57]. This paper describes “Darwin” — a co-processor for genomic sequence alignment that, without sacrificing sensitivity, provides up to 15,000× speedup over the state-of-the-art software for reference-guided assembly of third-generation reads. Darwin achieves this speedup through hardware/algorithm co-design, trading more easily accelerated alignment for less memory-intensive filtering, and by optimizing the memory system for filtering. Darwin combines a hardware-accelerated version of D-SOFT, a novel filtering algorithm, with a hardware-accelerated version of GACT, a novel alignment algorithm. GACT generates near-optimal alignments of arbitrarily long genomic sequences using constant memory for the compute-intensive step. Dar- win is adaptable, with tunable speed and sensitivity to match emerging sequencing technologies and to meet the requirements of genomic applications beyond read assembly.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.