Bioinformatics Archives - Page 250 of 267

July 7, 2019

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.© The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Complete genome sequence of Brevibacterium linens BS258, a potential marine Actinobacterium for environmental remediation via microbially induced calcite precipitation

Brevibacterium linens BS258 is a urease positive actinobacterium isolated from marine sediment of China Yellow Sea, which demonstrated to have strong capability of calcite precipitation and bioremediation of heavy metal pollution. Here, we report the complete genome sequence of this strain, which might provide a lot of valuable information for environmental remediation, wastewater treatment and atmospheric CO2 sequestration.

July 7, 2019

Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.

Haplotypes are the units of inheritance in an organism, and many genetic analyses depend on their precise determination. Methods for haplotyping single individuals use the phasing information available in next-generation sequencing reads, by matching overlapping single-nucleotide polymorphisms while penalizing post hoc nucleotide corrections made. Haplotyping diploids is relatively easy, but the complexity of the problem increases drastically for polyploid genomes, which are found in both model organisms and in economically relevant plant and animal species. Although a number of tools are available for haplotyping polyploids, the effects of the genomic makeup and the sequencing strategy followed on the accuracy of these methods have hitherto not been thoroughly evaluated.We developed the simulation pipeline haplosim to evaluate the performance of three haplotype estimation algorithms for polyploids: HapCompass, HapTree and SDhaP, in settings varying in sequencing approach, ploidy levels and genomic diversity, using tetraploid potato as the model. Our results show that sequencing depth is the major determinant of haplotype estimation quality, that 1?kb PacBio circular consensus sequencing reads and Illumina reads with large insert-sizes are competitive and that all methods fail to produce good haplotypes when ploidy levels increase. Comparing the three methods, HapTree produces the most accurate estimates, but also consumes the most resources. There is clearly room for improvement in polyploid haplotyping algorithms.

July 7, 2019

Microbial metagenomics mock scenario-based sample simulation (M3S3).

Shotgun sequencing in increasingly applied in clinical microbiology for unbiased culture-independent diagnosis. While software solutions for metagenomics proliferate, integration of metagenomics in clinical care, requires method standardisation and validation. Virtual metagenomics samples could underpin validation by substituting real samples and thus we sought to develop a novel solution for simulation of metagenomics samples based on user-defined clinical scenarios.We designed the Microbial Metagenomics Mock Scenario-based Sample Simulation (M3S3) workflow, which allows users to generate virtual samples from raw reads or assemblies. The M3S3 output is a mock sample in FASTQ or FASTA format. M3S3 was tested by generating virtual samples for ten challenging infectious disease scenarios, involving a background matrix ‘spiked’ in silico with pathogens including mixtures. Replicate samples (seven per scenario) were used to represent different compositional ratios. Virtual samples were analysed using Taxonomer and Kraken db.The ten challenge scenarios were successfully applied, generating 80 samples. For all tested scenarios, the virtual samples showed sequence compositions as predicted from the user input. Spiked pathogen sequences were identified with the majority of the replicates and most exhibited acceptable abundance (deviation between expected and observed abundance of spiked pathogens), with slight differences observed between software tools.Despite demonstrated proof-of-concept, integration of clinical metagenomics in routine microbiology remains a substantial challenge. M3S3 is capable of producing virtual samples on-demand, simulating a spectrum of clinical diagnostic scenarios of varying complexity. The M3S3 tool can therefore support the development and validation of standardised metagenomics applications. Copyright © 2017. Published by Elsevier Ltd.

July 7, 2019

Collection and storage of HLA NGS genotyping data for the 17th International HLA and Immunogenetics Workshop.

For over 50?years, the International HLA and Immunogenetics Workshops (IHIW) have advanced the fields of histocompatibility and immunogenetics (H&I) via community sharing of technology, experience and reagents, and the establishment of ongoing collaborative projects. Held in the fall of 2017, the 17th IHIW focused on the application of next generation sequencing (NGS) technologies for clinical and research goals in the H&I fields. NGS technologies have the potential to allow dramatic insights and advances in these fields, but the scope and sheer quantity of data associated with NGS raise challenges for their analysis, collection, exchange and storage. The 17th IHIW adopted a centralized approach to these issues, and we developed the tools, services and systems to create an effective system for capturing and managing these NGS data. We worked with NGS platform and software developers to define a set of distinct but equivalent NGS typing reports that record NGS data in a uniform fashion. The 17th IHIW database applied our standards, tools and services to collect, validate and store those structured, multi-platform data in an automated fashion. We have created community resources to enable exploration of the vast store of curated sequence and allele-name data in the IPD-IMGT/HLA Database, with the goal of creating a long-term community resource that integrates these curated data with new NGS sequence and polymorphism data, for advanced analyses and applications. Copyright © 2017 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

July 7, 2019

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies.Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S.pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies.The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo.baoe@bjtu.edu.cn.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

July 7, 2019

Microbial sequence typing in the genomic era.

Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology. Copyright © 2017 Elsevier B.V. All rights reserved.

July 7, 2019

A high throughput screen for active human transposable elements.

Transposable elements (TEs) are mobile genetic sequences that randomly propagate within their host’s genome. This mobility has the potential to affect gene transcription and cause disease. However, TEs are technically challenging to identify, which complicates efforts to assess the impact of TE insertions on disease. Here we present a targeted sequencing protocol and computational pipeline to identify polymorphic and novel TE insertions using next-generation sequencing: TE-NGS. The method simultaneously targets the three subfamilies that are responsible for the majority of recent TE activity (L1HS, AluYa5/8, and AluYb8/9) thereby obviating the need for multiple experiments and reducing the amount of input material required.Here we describe the laboratory protocol and detection algorithm, and a benchmark experiment for the reference genome NA12878. We demonstrate a substantial enrichment for on-target fragments, and high sensitivity and precision to both reference and NA12878-specific insertions. We report 17 previously unreported loci for this individual which are supported by orthogonal long-read evidence, and we identify 1470 polymorphic and novel TEs in 12 additional samples that were previously undocumented in databases of insertion polymorphisms.We anticipate that future applications of TE-NGS alongside exome sequencing of patients with sporadic disease will reduce the number of unresolved cases, and improve estimates of the contribution of TEs to human genetic disease.

July 7, 2019

New high copy tandem repeat in the content of the chicken W chromosome.

The content of repetitive DNA in avian genomes is considerably less than in other investigated vertebrates. The first descriptions of tandem repeats were based on the results of routine biochemical and molecular biological experiments. Both satellite DNA and interspersed repetitive elements were annotated using library-based approach and de novo repeat identification in assembled genome. The development of deep-sequencing methods provides datasets of high quality without preassembly allowing one to annotate repetitive elements from unassembled part of genomes. In this work, we search the chicken assembly and annotate high copy number tandem repeats from unassembled short raw reads. Tandem repeat (GGAAA)n has been identified and found to be the second after telomeric repeat (TTAGGG)n most abundant in the chicken genome. Furthermore, (GGAAA)n repeat forms expanded arrays on the both arms of the chicken W chromosome. Our results highlight the complexity of repetitive sequences and update data about organization of sex W chromosome in chicken.

July 7, 2019

Cupriavidus malaysiensis sp. nov., a novel poly(3-hydroxybutyrate-co-4-hydroxybutyrate) accumulating bacterium isolated from the Malaysian environment.

Bacterial classification on the basis of a polyphasic approach was conducted on three poly(3 hydroxybutyrate-co-4-hydroxybutyrate) [P(3HB-co-4HB)] accumulating bacterial strains that were isolated from samples collected from Malaysian environments; Kulim Lake, Sg. Pinang river and Sg. Manik paddy field. The Gram-negative, rod-shaped, motile, non-sporulating and non-fermenting bacteria were shown to belong to the genus Cupriavidus of the Betaproteobacteria on the basis of their 16S rRNA gene sequence analyses. The sequence similarity value with their near phylogenetic neighbour, Cupriavidus pauculus LMG3413T, was 98.5%. However, the DNA-DNA hybridization values (8-58%) and ribotyping analysis both enabled these strains to be differentiated from related Cupriavidus species with validly published names. The RiboPrint patterns of the three strains also revealed that the strains were genetically related even though they displayed a clonal diversity. The major cellular fatty acids detected in these strains included C15:0 ISO 2OH/C16:1 ?7c, hexadecanoic (16:0) and cis-11-octadecenoic (C18:1 ?7c). Their G+C contents ranged from 68.0 to 68.6 mol%, and their major isoprenoid quinone was Ubiquinone Q-8. Of these three strains, only strain USMAHM13 (= DSM 25816 = KCTC 32390) was discovered to exhibit yellow pigmentation that is characteristic of the carotenoid family. Their assembled genomes also showed that the three strains were not identical in terms of their genome sizes that were 7.82, 7.95 and 8.70 Mb for strains USMAHM13, USMAA1020 and USMAA2-4, respectively, which are slightly larger than that of Cupriavidus necator H16 (7.42 Mb). The average nucleotide identity (ANI) results indicated that the strains were genetically related and the genome pairs belong to the same species. On the basis of the results obtained in this study, the three strains are considered to represent a novel species for which the name Cupriavidus malaysiensis sp. nov. is proposed. The type strain of the species is USMAA1020T (= DSM 19416T = KCTC 32390T).

July 7, 2019

De novo mutations resolve disease transmission pathways in clonal malaria

Detecting de novo mutations in viral and bacterial pathogens enables researchers to reconstruct detailed networks of disease transmission and is a key technique in genomic epidemiology. However, these techniques have not yet been applied to the malaria parasite, Plasmodium falciparum, in which a larger genome, slower generation times, and a complex life cycle make them difficult to implement. Here, we demonstrate the viability of de novo mutation studies in P. falciparum for the first time. Using a combination of sequencing, library preparation, and genotyping methods that have been optimized for accuracy in low-complexity genomic regions, we have detected de novo mutations that distinguish nominally identical parasites from clonal lineages. Despite its slower evolutionary rate compared with bacterial or viral species, de novo mutation can be detected in P. falciparum across timescales of just 1-2?years and evolutionary rates in low-complexity regions of the genome can be up to twice that detected in the rest of the genome. The increased mutation rate allows the identification of separate clade expansions that cannot be found using previous genomic epidemiology approaches and could be a crucial tool for mapping residual transmission patterns in disease elimination campaigns and reintroduction scenarios.

July 7, 2019

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Lactobacillus plantarum is one of the widely-used probiotics and there have been a large number of advanced researches on the effectiveness of this species. However, the difference between previously reported plantarum strains, and the source of genomic variation among the strains were not clearly specified. In order to understand further on the molecular basis of L. plantarum on Korean traditional fermentation, we isolated the L. plantarum GB-LP4 from Korean fermented vegetable and conducted whole genome assembly. With comparative genomics approach, we identified the candidate genes that are expected to have undergone evolutionary acceleration. These genes have been reported to associate with the maintaining homeostasis, which are generally known to overcome instability in external environment including low pH or high osmotic pressure. Here, our results provide an evolutionary relationship between L. plantarum species and elucidate the candidate genes that play a pivotal role in evolutionary acceleration of GB-LP4 in high osmolarity environment. This study may provide guidance for further studies on L. plantarum.

July 7, 2019

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Completion of eukaryal genomes can be difficult task with the highly repetitive sequences along the chromosomes and short read lengths of second-generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model organism and a cell factory, was selected for this study to demonstrate the superior capability of very long sequence reads for de novo genome assembly. We generated long reads using two common third-generation sequencing technologies (Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)) and used short reads obtained using Illumina sequencing for error correction. Assembly of the reads derived from all three technologies resulted in complete sequences for all 16 yeast chromosomes, as well as the mitochondrial chromosome, in one step. Further, we identified three types of DNA methylation (5mC, 4mC and 6mA). Comparison between the reference strain S288C and strain CEN.PK113-7D identified chromosomal rearrangements against a background of similar gene content between the two strains. We identified full-length transcripts through ONT direct RNA sequencing technology. This allows for the identification of transcriptional landscapes, including untranslated regions (UTRs) (5′ UTR and 3′ UTR) as well as differential gene expression quantification. About 91% of the predicted transcripts could be consistently detected across biological replicates grown either on glucose or ethanol. Direct RNA sequencing identified many polyadenylated non-coding RNAs, rRNAs, telomere-RNA, long non-coding RNA and antisense RNA. This work demonstrates a strategy to obtain complete genome sequences and transcriptional landscapes that can be applied to other eukaryal organisms.

July 7, 2019

Complete genomic analysis of multidrug-resistance Pseudomonas aeruginosa Guangzhou-Pae617, the host of megaplasmid pBM413.

We previously described the novel qnrVC6 and blaIMP-45carrying megaplasmid pBM413. This study aimed to investigate the complete genome of multidrug-resistance P. aeruginosa Guangzhou-Pae617, a clinical isolate from the sputum of a patient who was suffering from respiratory disease in Guangzhou, China.The genome was sequenced using Illumina Hiseq 2500 and PacBio RS II sequencers and assembled de novo using HGAP. The genome was automatically and manually annotated.The genome of P. aeruginosa Guangzhou-Pae617 is 6,430,493 bp containing 5881 predicted genes with an average G + C content of 66.43%. The genome showed high similarity to two new sequenced P. aeruginosa strains isolated from New York, USA. From the whole genome sequence, we identified a type IV pilin, two large prophages, 15 antibiotic resistant genes, 5 genes involved in the “Infectious diseases” pathways, and 335 virulence factors.The antibiotic resistance and virulence factors in the genome of P. aeruginosa strain Guangzhou-Pae617 were identified by complete genomic analysis. It contributes to further study on antibiotic resistance mechanism and clinical control of P. aeruginosa. Copyright © 2018 Elsevier Ltd. All rights reserved.

July 7, 2019

Full genome sequence of the Western Reserve strain of vaccinia virus determined by third-generation sequencing.

The vaccinia virus is a large, complex virus belonging to thePoxviridaefamily. Here, we report the complete, annotated genome sequence of the neurovirulent Western Reserve laboratory strain of this virus, which was sequenced on the Pacific Biosciences RS II and Oxford Nanopore MinION platforms. Copyright © 2018 Prazsák et al.

Auto Tag: Bioinformatics

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Complete genome sequence of Brevibacterium linens BS258, a potential marine Actinobacterium for environmental remediation via microbially induced calcite precipitation

Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.

Microbial metagenomics mock scenario-based sample simulation (M3S3).

Collection and storage of HLA NGS genotyping data for the 17th International HLA and Immunogenetics Workshop.

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Microbial sequence typing in the genomic era.

A high throughput screen for active human transposable elements.

New high copy tandem repeat in the content of the chicken W chromosome.

Cupriavidus malaysiensis sp. nov., a novel poly(3-hydroxybutyrate-co-4-hydroxybutyrate) accumulating bacterium isolated from the Malaysian environment.

De novo mutations resolve disease transmission pathways in clonal malaria

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Complete genomic analysis of multidrug-resistance Pseudomonas aeruginosa Guangzhou-Pae617, the host of megaplasmid pBM413.

Full genome sequence of the Western Reserve strain of vaccinia virus determined by third-generation sequencing.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert