de novo assembly Archives - Page 330 of 359

July 7, 2019

High-quality whole-genome sequences for 21 enterotoxigenic Escherichia coli strains generated with PacBio sequencing.

Enterotoxigenic Escherichia coli (ETEC) is an important diarrheagenic pathogen. We report here the high-quality whole-genome sequences of 21 ETEC strains isolated from patients in the United States, international diarrheal surveillance studies, and cruise ship outbreaks.

July 7, 2019

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Lactobacillus plantarum is one of the widely-used probiotics and there have been a large number of advanced researches on the effectiveness of this species. However, the difference between previously reported plantarum strains, and the source of genomic variation among the strains were not clearly specified. In order to understand further on the molecular basis of L. plantarum on Korean traditional fermentation, we isolated the L. plantarum GB-LP4 from Korean fermented vegetable and conducted whole genome assembly. With comparative genomics approach, we identified the candidate genes that are expected to have undergone evolutionary acceleration. These genes have been reported to associate with the maintaining homeostasis, which are generally known to overcome instability in external environment including low pH or high osmotic pressure. Here, our results provide an evolutionary relationship between L. plantarum species and elucidate the candidate genes that play a pivotal role in evolutionary acceleration of GB-LP4 in high osmolarity environment. This study may provide guidance for further studies on L. plantarum.

July 7, 2019

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Completion of eukaryal genomes can be difficult task with the highly repetitive sequences along the chromosomes and short read lengths of second-generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model organism and a cell factory, was selected for this study to demonstrate the superior capability of very long sequence reads for de novo genome assembly. We generated long reads using two common third-generation sequencing technologies (Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)) and used short reads obtained using Illumina sequencing for error correction. Assembly of the reads derived from all three technologies resulted in complete sequences for all 16 yeast chromosomes, as well as the mitochondrial chromosome, in one step. Further, we identified three types of DNA methylation (5mC, 4mC and 6mA). Comparison between the reference strain S288C and strain CEN.PK113-7D identified chromosomal rearrangements against a background of similar gene content between the two strains. We identified full-length transcripts through ONT direct RNA sequencing technology. This allows for the identification of transcriptional landscapes, including untranslated regions (UTRs) (5′ UTR and 3′ UTR) as well as differential gene expression quantification. About 91% of the predicted transcripts could be consistently detected across biological replicates grown either on glucose or ethanol. Direct RNA sequencing identified many polyadenylated non-coding RNAs, rRNAs, telomere-RNA, long non-coding RNA and antisense RNA. This work demonstrates a strategy to obtain complete genome sequences and transcriptional landscapes that can be applied to other eukaryal organisms.

July 7, 2019

Whole-genome sequence of Mycoplasma bovis strain Ningxia-1.

A genome sequence of the Mycoplasma bovis Ningxia-1 strain was tested by Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing technology. The strain was isolated from a lesioned calf lung in 2013 in Pengyang, Ningxia, China. The single circular chromosome of 1,033,629 bp shows differences between complete Mycoplasma bovis genome in insertion-like sequences (ISs), integrative conjugative elements (ICEs), lipoproteins (LPs), variable surface lipoproteins (VSPs), pathogenicity islands (PAIs), etc. Copyright © 2018 Sun et al.

July 7, 2019

Complete genome sequences of the plant pathogens Dickeya solani RNS 08.23.3.1.A and Dickeya dianthicola RNS04.9.

Dickeya spp. are bacterial pathogens causing soft-rot and blackleg diseases on a wide range of ornamental plants and crops. In this paper, we announce the PacBio complete genome sequences of the plant pathogens Dickeya solani RNS 08.23.3.1.A (PRI3337) and Dickeya dianthicola RNS04.9. Copyright © 2018 Khayi et al.

July 7, 2019

Complete genomic analysis of multidrug-resistance Pseudomonas aeruginosa Guangzhou-Pae617, the host of megaplasmid pBM413.

We previously described the novel qnrVC6 and blaIMP-45carrying megaplasmid pBM413. This study aimed to investigate the complete genome of multidrug-resistance P. aeruginosa Guangzhou-Pae617, a clinical isolate from the sputum of a patient who was suffering from respiratory disease in Guangzhou, China.The genome was sequenced using Illumina Hiseq 2500 and PacBio RS II sequencers and assembled de novo using HGAP. The genome was automatically and manually annotated.The genome of P. aeruginosa Guangzhou-Pae617 is 6,430,493 bp containing 5881 predicted genes with an average G + C content of 66.43%. The genome showed high similarity to two new sequenced P. aeruginosa strains isolated from New York, USA. From the whole genome sequence, we identified a type IV pilin, two large prophages, 15 antibiotic resistant genes, 5 genes involved in the “Infectious diseases” pathways, and 335 virulence factors.The antibiotic resistance and virulence factors in the genome of P. aeruginosa strain Guangzhou-Pae617 were identified by complete genomic analysis. It contributes to further study on antibiotic resistance mechanism and clinical control of P. aeruginosa. Copyright © 2018 Elsevier Ltd. All rights reserved.

July 7, 2019

Full genome sequence of the Western Reserve strain of vaccinia virus determined by third-generation sequencing.

The vaccinia virus is a large, complex virus belonging to thePoxviridaefamily. Here, we report the complete, annotated genome sequence of the neurovirulent Western Reserve laboratory strain of this virus, which was sequenced on the Pacific Biosciences RS II and Oxford Nanopore MinION platforms. Copyright © 2018 Prazsák et al.

July 7, 2019

Assembly, annotation, and comparative genomics in PATRIC, the All Bacterial Bioinformatics Resource Center.

In the “big data” era, research biologists are faced with analyzing new types that usually require some level of computational expertise. A number of programs and pipelines exist, but acquiring the expertise to run them, and then understanding the output can be a challenge.The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org ) has created an end-to-end analysis platform that allows researchers to take their raw reads, assemble a genome, annotate it, and then use a suite of user-friendly tools to compare it to any public data that is available in the repository. With close to 113,000 bacterial and more than 1000 archaeal genomes, PATRIC creates a unique research experience with “virtual integration” of private and public data. PATRIC contains many diverse tools and functionalities to explore both genome-scale and gene expression data, but the main focus of this chapter is on assembly, annotation, and the downstream comparative analysis functionality that is freely available in the resource.

July 7, 2019

Complete genome sequence of Streptomyces formicae KY5, the formicamycin producer.

Here we report the complete genome of the new species Streptomyces formicae KY5 isolated from Tetraponera fungus growing ants. S. formicae was sequenced using the PacBio and 454 platforms to generate a single linear chromosome with terminal inverted repeats. Illumina MiSeq sequencing was used to correct base changes resulting from the high error rate associated with PacBio. The genome is 9.6 Mbps, has a GC content of 71.38% and contains 8162 protein coding sequences. Predictive analysis shows this strain encodes at least 45 gene clusters for the biosynthesis of secondary metabolites, including a type 2 polyketide synthase encoding cluster for the antibacterial formicamycins. Streptomyces formicae KY5 is a new, taxonomically distinct Streptomyces species and this complete genome sequence provides an important marker in the genus of Streptomyces. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of Planococcus faecalis AJ003T, the type species of the genus Planococcus and a microbial C30 carotenoid producer.

A novel type strain, Planococcus faecalis AJ003T, isolated from the feces of Antarctic penguins, synthesizes a rare C30 carotenoid, glycosyl-4,4′-diaponeurosporen-4′-ol-4-oic acid. The complete genome of P. faecalis AJ003Tcomprises a single circular chromosome (3,495,892?bp; 40.9% G?+?C content). Annotation analysis has revealed 3511 coding DNA sequences and 99 RNAs; seven genes associated with the MEP pathway and five genes involved in the carotenoid pathway have been identified. The functionality and complementation of 4,4′-diapophytoene synthase (CrtM) and two copies of heterologous 4,4′-diapophytoene desaturase (CrtN) involved in carotenoid biosynthesis were analyzed in Escherichia coli. Copyright © 2017 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of the highly Mn(II) tolerant Staphylococcus sp. AntiMn-1 isolated from deep-sea sediment in the Clarion-Clipperton Zone.

Staphylococcus sp. AntiMn-1 is a deep-sea bacterium inhabiting seafloor sediment in the Clarion-Clipperton Zone (CCZ) that is highly tolerant to Mn(II) and displays efficient Mn(II) oxidation. Herein, we present the assembly and annotation of its genome. Copyright © 2017 Elsevier B.V. All rights reserved.

July 7, 2019

Complete genome sequence of Flavobacterium kingsejongi WV39, a type species of the genus Flavobacterium and a microbial C40 carotenoid zeaxanthin producer.

A novel species, Flavobacterium kingsejongi WV39, isolated from feces of Antarctic penguins and a type species of the genus Flavobacterium, is yellow because it synthesizes a C40 carotenoid zeaxanthin. The complete genome of F. kingsejongi WV39 is made up of a single circular chromosome (4,224,053bp, 39.8% G+C content). Annotation analysis revealed 3,955 coding sequences, 72 RNAs (18 rRNA+54 tRNA), and five genes involved in zeaxanthin biosynthesis. The key gene encoding ß-carotenoid hydroxylase (CrtZ), which is the last enzyme in the zeaxanthin biosynthetic pathway, was cloned and subjected to complementary analysis in a heterologous E. coli strain. The CrtZ of F. kingsejongi WV39 showed a higher activity than other reported CrtZs. Copyright © 2017 Elsevier B.V. All rights reserved.

July 7, 2019

RepLong: de novo repeat identification using long read sequencing data.

The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats.In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data.The software of RepLong is freely available at https://github.com/ruiguo-bio/replong.ywsun@szu.edu.cn or zhuzx@szu.edu.cn.Supplementary data are available at Bioinformatics online.

July 7, 2019

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.

July 7, 2019

Recent progress and prospects for advancing arachnid genomics

Arachnids exhibit tremendous species richness and adaptations of biomedical, industrial, and agricultural importance. Yet genomic resources for arachnids are limited, with the first few spider and scorpion genomes becoming accessible in the last four years. We review key insights from these genome projects, and recommend additional genomes for sequencing, emphasizing taxa of greatest value to the scientific community. We suggest greater sampling of spiders whose genomes are understudied but hold important protein recipes for silk and venom production. We further recommend arachnid genomes to address significant evolutionary topics, including the phenotypic impact of genome duplications. A barrier to high-quality arachnid genomes are assemblies based solely on short-read data, which may be overcome by long-range sequencing and other emerging methods.

Asset Tag: de novo assembly

High-quality whole-genome sequences for 21 enterotoxigenic Escherichia coli strains generated with PacBio sequencing.

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Whole-genome sequence of Mycoplasma bovis strain Ningxia-1.

Complete genome sequences of the plant pathogens Dickeya solani RNS 08.23.3.1.A and Dickeya dianthicola RNS04.9.

Complete genomic analysis of multidrug-resistance Pseudomonas aeruginosa Guangzhou-Pae617, the host of megaplasmid pBM413.

Full genome sequence of the Western Reserve strain of vaccinia virus determined by third-generation sequencing.

Assembly, annotation, and comparative genomics in PATRIC, the All Bacterial Bioinformatics Resource Center.

Complete genome sequence of Streptomyces formicae KY5, the formicamycin producer.

Complete genome sequence of Planococcus faecalis AJ003T, the type species of the genus Planococcus and a microbial C30 carotenoid producer.

Complete genome sequence of the highly Mn(II) tolerant Staphylococcus sp. AntiMn-1 isolated from deep-sea sediment in the Clarion-Clipperton Zone.

Complete genome sequence of Flavobacterium kingsejongi WV39, a type species of the genus Flavobacterium and a microbial C40 carotenoid zeaxanthin producer.

RepLong: de novo repeat identification using long read sequencing data.

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Recent progress and prospects for advancing arachnid genomics

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert