Genome assembly Archives - Page 177 of 196

July 7, 2019

Jabba: hybrid error correction for long sequencing reads.

Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.

July 7, 2019

A simple thermoplastic substrate containing hierarchical silica lamellae for high-molecular-weight DNA extraction.

An inexpensive, magnetic thermoplastic nanomaterial is developed utilizing a hierarchical layering of micro- and nanoscale silica lamellae to create a high-surface-area and low-shear substrate capable of capturing vast amounts of ultrahigh-molecular-weight DNA. Extraction is performed via a simple 45 min process and is capable of achieving binding capacities up to 1 000 000 times greater than silica microparticles. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

July 7, 2019

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Information-optimal genome assembly via sparse read-overlap graphs.

In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence?Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50.Available at github.com/samhykim/nsgcourtade@eecs.berkeley.edu or dntse@stanford.eduSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Genome-guided design of a defined mouse microbiota that confers colonization resistance against Salmonella enterica serovar Typhimurium.

Protection against enteric infections, also termed colonization resistance, results from mutualistic interactions of the host and its indigenous microbes. The gut microbiota of humans and mice is highly diverse and it is therefore challenging to assign specific properties to its individual members. Here, we have used a collection of murine bacterial strains and a modular design approach to create a minimal bacterial community that, once established in germ-free mice, provided colonization resistance against the human enteric pathogen Salmonella enterica serovar Typhimurium (S. Tm). Initially, a community of 12 strains, termed Oligo-Mouse-Microbiota (Oligo-MM(12)), representing members of the major bacterial phyla in the murine gut, was selected. This community was stable over consecutive mouse generations and provided colonization resistance against S. Tm infection, albeit not to the degree of a conventional complex microbiota. Comparative (meta)genome analyses identified functions represented in a conventional microbiome but absent from the Oligo-MM(12). By genome-informed design, we created an improved version of the Oligo-MM community harbouring three facultative anaerobic bacteria from the mouse intestinal bacterial collection (miBC) that provided conventional-like colonization resistance. In conclusion, we have established a highly versatile experimental system that showed efficacy in an enteric infection model. Thus, in combination with exhaustive bacterial strain collections and systems-based approaches, genome-guided design can be used to generate insights into microbe-microbe and microbe-host interactions for the investigation of ecological and disease-relevant mechanisms in the intestine.

July 7, 2019

Identification of a virulence determinant that is conserved in the Jawetz and Heyl biotypes of [Pasteurella] pneumotropica.

[Pasteurella] pneumotropica is a ubiquitous bacterium frequently isolated from laboratory rodents. Although this bacterium causes various diseases in immunosuppressed animals, little is known about major virulence factors and their roles in pathogenicity. To identify virulence factors, we sequenced the genome of [P.] pneumotropica biotype Heyl strain ATCC 12555, and compared the resulting non-contiguous draft genome sequence with the genome of biotype Jawetz strain ATCC 35149. Among a large number of genes encoding virulence-associated factors in both strains, four genes encoding for YadA-like proteins, which are known virulence factors that function in host cell adherence and invasion in many pathogens. In this study, we assessed YadA distribution and biological activity as an example of one of virulence-associated factor shared, with biotype Jawetz and Heyl. More than half of mouse isolates were found to have at least one of these genes; whereas, the majority of rat isolates did not. Autoagglutination activity, and ability to bind to mouse collagen type IV and mouse fibroblast cells, was significantly higher in YadA-positive than YadA-negative strains. To conclude, we identified a large number of candidate genes predicted to influence [P.] pneumotropica pathogenesis.© FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Genomic sequencing-based mutational enrichment analysis identifies motility genes in a genetically intractable gut microbe.

A major roadblock to understanding how microbes in the gastrointestinal tract colonize and influence the physiology of their hosts is our inability to genetically manipulate new bacterial species and experimentally assess the function of their genes. We describe the application of population-based genomic sequencing after chemical mutagenesis to map bacterial genes responsible for motility in Exiguobacterium acetylicum, a representative intestinal Firmicutes bacterium that is intractable to molecular genetic manipulation. We derived strong associations between mutations in 57 E. acetylicum genes and impaired motility. Surprisingly, less than half of these genes were annotated as motility-related based on sequence homologies. We confirmed the genetic link between individual mutations and loss of motility for several of these genes by performing a large-scale analysis of spontaneous suppressor mutations. In the process, we reannotated genes belonging to a broad family of diguanylate cyclases and phosphodiesterases to highlight their specific role in motility and assigned functions to uncharacterized genes. Furthermore, we generated isogenic strains that allowed us to establish that Exiguobacterium motility is important for the colonization of its vertebrate host. These results indicate that genetic dissection of a complex trait, functional annotation of new genes, and the generation of mutant strains to define the role of genes in complex environments can be accomplished in bacteria without the development of species-specific molecular genetic tools.

July 7, 2019

DNA extraction protocols for whole-genome sequencing in marine organisms.

The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths’ different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

July 7, 2019

Complete genome sequence and transcriptome regulation of the pentose utilizing yeast Sugiyamaella lignohabitans.

Efficient conversion of hexoses and pentoses into value-added chemicals represents one core step for establishing economically feasible biorefineries from lignocellulosic material. While extensive research efforts have recently provided advances in the overall process performance, the quest for new microbial cell factories and novel enzymes sources is still open. As demonstrated recently the yeast Sugiyamaella lignohabitans (formerly Candida lignohabitans) represents a promising microbial cell factory for the production of organic acids from lignocellulosic hydrolysates. We report here the de novo genome assembly of S. lignohabitans using the Single Molecule Real-Time platform, with gene prediction refined by using RNA-seq. The sequencing revealed a 15.98 Mb genome, subdivided into four chromosomes. By phylogenetic analysis, Blastobotrys (Arxula) adeninivorans and Yarrowia lipolytica were found to be close relatives of S. lignohabitans Differential gene expression was evaluated in typical growth conditions on glucose and xylose and allowed a first insight into the transcriptional response of S. lignohabitans to different carbon sources and different oxygenation conditions. Novel sequences for enzymes and transporters involved in the central carbon metabolism, and therefore of potential biotechnological interest, were identified. These data open the way for a better understanding of the metabolism of S. lignohabitans and provide resources for further metabolic engineering.© FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Origins of the current seventh cholera pandemic.

Vibrio cholerae has caused seven cholera pandemics since 1817, imposing terror on much of the world, but bacterial strains are currently only available for the sixth and seventh pandemics. The El Tor biotype seventh pandemic began in 1961 in Indonesia, but did not originate directly from the classical biotype sixth-pandemic strain. Previous studies focused mainly on the spread of the seventh pandemic after 1970. Here, we analyze in unprecedented detail the origin, evolution, and transition to pandemicity of the seventh-pandemic strain. We used high-resolution comparative genomic analysis of strains collected from 1930 to 1964, covering the evolution from the first available El Tor biotype strain to the start of the seventh pandemic. We define six stages leading to the pandemic strain and reveal all key events. The seventh pandemic originated from a nonpathogenic strain in the Middle East, first observed in 1897. It subsequently underwent explosive diversification, including the spawning of the pandemic lineage. This rapid diversification suggests that, when first observed, the strain had only recently arrived in the Middle East, possibly from the Asian homeland of cholera. The lineage migrated to Makassar, Indonesia, where it gained the important virulence-associated elements Vibrio seventh pandemic island I (VSP-I), VSP-II, and El Tor type cholera toxin prophage by 1954, and it then became pandemic in 1961 after only 12 additional mutations. Our data indicate that specific niches in the Middle East and Makassar were important in generating the pandemic strain by providing gene sources and the driving forces for genetic events.

July 7, 2019

Genomic insights into a sustained national outbreak of Yersinia pseudotuberculosis.

In 2014, a sustained outbreak of yersiniosis due to Yersinia pseudotuberculosis occurred across all major cities in New Zealand (NZ), with a total of 220 laboratory-confirmed cases, representing one of the largest ever reported outbreaks of Y. pseudotuberculosis. Here, we performed whole genome sequencing of outbreak-associated isolates to produce the largest population analysis to date of Y. pseudotuberculosis, giving us unprecedented capacity to understand the emergence and evolution of the outbreak clone. Multivariate analysis incorporating our genomic and clinical epidemiological data strongly suggested a single point-source contamination of the food chain, with subsequent nationwide distribution of contaminated produce. We additionally uncovered significant diversity in key determinants of virulence, which we speculate may help explain the high morbidity linked to this outbreak.

July 7, 2019

Systems biology-guided biodesign of consolidated lignin conversion

Lignin is the second most abundant biopolymer on the earth, yet its utilization for fungible products is complicated by its recalcitrant nature and remains a major challenge for sustainable lignocellulosic biorefineries. In this study, we used a systems biology approach to reveal the carbon utilization pattern and lignin degradation mechanisms in a unique lignin-utilizing Pseudomonas putida strain (A514). The mechanistic study further guided the design of three functional modules to enable a consolidated lignin bioconversion route. First, P. putida A514 mobilized a dye peroxidase-based enzymatic system for lignin depolymerization. This system could be enhanced by overexpressing a secreted multifunctional dye peroxidase to promote a two-fold enhancement of cell growth on insoluble kraft lignin. Second, A514 employed a variety of peripheral and central catabolism pathways to metabolize aromatic compounds, which can be optimized by overexpressing key enzymes. Third, the ß-oxidation of fatty acid was up-regulated, whereas fatty acid synthesis was down-regulated when A514 was grown on lignin and vanillic acid. Therefore, the functional module for polyhydroxyalkanoate (PHA) production was designed to rechannel ß-oxidation products. As a result, PHA content reached 73% per cell dry weight (CDW). Further integrating the three functional modules enhanced the production of PHA from kraft lignin and biorefinery waste. Thus, this study elucidated lignin conversion mechanisms in bacteria with potential industrial implications and laid out the concept for engineering a consolidated lignin conversion route.

July 7, 2019

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.© The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Clonal dissemination of Pseudomonas aeruginosa sequence type 235 isolates carrying blaIMP-6 and emergence of blaGES-24 and blaIMP-10 on novel genomic islands PAGI-15 and -16 in South Korea.

A total of 431 Pseudomonas aeruginosa clinical isolates were collected from 29 general hospitals in South Korea in 2015. Antimicrobial susceptibility was tested by the disk diffusion method, and MICs of carbapenems were determined by the agar dilution method. Carbapenemase genes were amplified by PCR and sequenced, and the structures of class 1 integrons surrounding the carbapenemase gene cassettes were analyzed by PCR mapping. Multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE) were performed for strain typing. Whole-genome sequencing was carried out to analyze P. aeruginosa genomic islands (PAGIs) carrying the blaIMP-6, blaIMP-10, and blaGES-24 genes. The rates of carbapenem-nonsusceptible and carbapenemase-producing P. aeruginosa isolates were 34.3% (148/431) and 9.5% (41/431), respectively. IMP-6 was the most prevalent carbapenemase type, followed by VIM-2, IMP-10, and GES-24. All carbapenemase genes were located on class 1 integrons of 6 different types on the chromosome. All isolates harboring carbapenemase genes exhibited genetic relatedness by PFGE (similarity > 80%); moreover, all isolates were identified as sequence type 235 (ST235), with the exception of two ST244 isolates by MLST. The blaIMP-6, blaIMP-10, and blaGES-24 genes were found to be located on two novel PAGIs, designated PAGI-15 and PAGI-16. Our data support the clonal spread of an IMP-6-producing P. aeruginosa ST235 strain, and the emergence of IMP-10 and GES-24 demonstrates the diversification of carbapenemases in P. aeruginosa in Korea. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

July 7, 2019

Complete genome sequence of a psychotrophic Pseudarthrobacter sulfonivorans strain Ar51 (CGMCC 4.7316), a novel crude oil and multi benzene compounds degradation strain.

Pseudarthrobacter sulfonivorans strain Ar51, a psychotrophic bacterium isolated from the Tibet permafrost of China, can degrade crude oil and multi benzene compounds efficiently in low temperature. Here we report the complete genome sequence of this bacterium. The complete genome sequence of Pseudarthrobacter sulfonivorans strain Ar51, consisting of a cycle chromosome with a size of 5.04Mbp and a cycle plasmid with a size of 12.39kbp. The availability of this genome sequence allows us to investigate the genetic basis of crude oil degradation and adaptation to growth in a nutrient-poor permafrost environment. Copyright © 2016 Elsevier B.V. All rights reserved.

Auto Tag: Genome assembly

Jabba: hybrid error correction for long sequencing reads.

A simple thermoplastic substrate containing hierarchical silica lamellae for high-molecular-weight DNA extraction.

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Information-optimal genome assembly via sparse read-overlap graphs.

Genome-guided design of a defined mouse microbiota that confers colonization resistance against Salmonella enterica serovar Typhimurium.

Identification of a virulence determinant that is conserved in the Jawetz and Heyl biotypes of [Pasteurella] pneumotropica.

Genomic sequencing-based mutational enrichment analysis identifies motility genes in a genetically intractable gut microbe.

DNA extraction protocols for whole-genome sequencing in marine organisms.

Complete genome sequence and transcriptome regulation of the pentose utilizing yeast Sugiyamaella lignohabitans.

Origins of the current seventh cholera pandemic.

Genomic insights into a sustained national outbreak of Yersinia pseudotuberculosis.

Systems biology-guided biodesign of consolidated lignin conversion

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Clonal dissemination of Pseudomonas aeruginosa sequence type 235 isolates carrying blaIMP-6 and emergence of blaGES-24 and blaIMP-10 on novel genomic islands PAGI-15 and -16 in South Korea.

Complete genome sequence of a psychotrophic Pseudarthrobacter sulfonivorans strain Ar51 (CGMCC 4.7316), a novel crude oil and multi benzene compounds degradation strain.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert