Hybrid assembly Archives - Page 46 of 49

July 7, 2019

Assembly and transfer of tripartite integrative and conjugative genetic elements.

Integrative and conjugative elements (ICEs) are ubiquitous mobile genetic elements present as “genomic islands” within bacterial chromosomes. Symbiosis islands are ICEs that convert nonsymbiotic mesorhizobia into symbionts of legumes. Here we report the discovery of symbiosis ICEs that exist as three separate chromosomal regions when integrated in their hosts, but through recombination assemble as a single circular ICE for conjugative transfer. Whole-genome comparisons revealed exconjugants derived from nonsymbiotic mesorhizobia received three separate chromosomal regions from the donor Mesorhizobium ciceri WSM1271. The three regions were each bordered by two nonhomologous integrase attachment (att) sites, which together comprised three homologous pairs of attL and attR sites. Sequential recombination between each attL and attR pair produced corresponding attP and attB sites and joined the three fragments to produce a single circular ICE, ICEMcSym(1271) A plasmid carrying the three attP sites was used to recreate the process of tripartite ICE integration and to confirm the role of integrase genes intS, intM, and intG in this process. Nine additional tripartite ICEs were identified in diverse mesorhizobia and transfer was demonstrated for three of them. The transfer of tripartite ICEs to nonsymbiotic mesorhizobia explains the evolution of competitive but suboptimal N2-fixing strains found in Western Australian soils. The unheralded existence of tripartite ICEs raises the possibility that multipartite elements reside in other organisms, but have been overlooked because of their unusual biology. These discoveries reveal mechanisms by which integrases dramatically manipulate bacterial genomes to allow cotransfer of disparate chromosomal regions.

July 7, 2019

Capturing pairwise and multi-way chromosomal conformations using chromosomal walks.

Chromosomes are folded into highly compacted structures to accommodate physical constraints within nuclei and to regulate access to genomic information. Recently, global mapping of pairwise contacts showed that loops anchoring topological domains (TADs) are highly conserved between cell types and species. Whether pairwise loops synergize to form higher-order structures is still unclear. Here we develop a conformation capture assay to study higher-order organization using chromosomal walks (C-walks) that link multiple genomic loci together into proximity chains in human and mouse cells. This approach captures chromosomal structure at varying scales. Inter-chromosomal contacts constitute only 7-10% of the pairs and are restricted by interfacing TADs. About half of the C-walks stay within one chromosome, and almost half of those are restricted to intra-TAD spaces. C-walks that couple 2-4 TADs indicate stochastic associations between transcriptionally active, early replicating loci. Targeted analysis of thousands of 3-walks anchored at highly expressed genes support pairwise, rather than hub-like, chromosomal topology at active loci. Polycomb-repressed Hox domains are shown by the same approach to enrich for synergistic hubs. Together, the data indicate that chromosomal territories, TADs, and intra-TAD loops are primarily driven by nested, possibly dynamic, pairwise contacts.

July 7, 2019

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity.

July 7, 2019

Decay of sexual trait genes in an asexual parasitoid wasp.

Trait loss is a widespread phenomenon with pervasive consequences for a species’ evolutionary potential. The genetic changes underlying trait loss have only been clarified in a small number of cases. None of these studies can identify whether the loss of the trait under study was a result of neutral mutation accumulation or negative selection. This distinction is relatively clear-cut in the loss of sexual traits in asexual organisms. Male-specific sexual traits are not expressed and can only decay through neutral mutations, whereas female-specific traits are expressed and subject to negative selection. We present the genome of an asexual parasitoid wasp and compare it to that of a sexual lineage of the same species. We identify a short-list of 16 genes for which the asexual lineage carries deleterious SNP or indel variants, whereas the sexual lineage does not. Using tissue-specific expression data from other insects, we show that fifteen of these are expressed in male-specific reproductive tissues. Only one deleterious variant was found that is expressed in the female-specific spermathecae, a trait that is heavily degraded and thought to be under negative selection in L. clavipes. Although the phenotypic decay of male-specific sexual traits in asexuals is generally slow compared with the decay of female-specific sexual traits, we show that male-specific traits do indeed accumulate deleterious mutations as expected by theory. Our results provide an excellent starting point for detailed study of the genomics of neutral and selected trait decay.

July 7, 2019

Whole-genome de novo sequencing, combined with RNA-Seq analysis, reveals unique genome and physiological features of the amylolytic yeast Saccharomycopsis fibuligera and its interspecies hybrid.

Genomic studies on fungal species with hydrolytic activity have gained increased attention due to their great biotechnological potential for biomass-based biofuel production. The amylolytic yeast Saccharomycopsis fibuligera has served as a good source of enzymes and genes involved in saccharification. Despite its long history of use in food fermentation and bioethanol production, very little is known about the basic physiology and genomic features of S. fibuligera.We performed whole-genome (WG) de novo sequencing and complete assembly of S. fibuligera KJJ81 and KPH12, two isolates from wheat-based Nuruk in Korea. Intriguingly, the KJJ81 genome (~38 Mb) was revealed as a hybrid between the KPH12 genome (~18 Mb) and another unidentified genome sharing 88.1% nucleotide identity with the KPH12 genome. The seven chromosome pairs of KJJ81 subgenomes exhibit highly conserved synteny, indicating a very recent hybridization event. The phylogeny inferred from WG comparisons showed an early divergence of S. fibuligera before the separation of the CTG and Saccharomycetaceae clades in the subphylum Saccharomycotina. Reconstructed carbon and sulfur metabolic pathways, coupled with RNA-Seq analysis, suggested a marginal Crabtree effect under high glucose and activation of sulfur metabolism toward methionine biosynthesis under sulfur limitation in this yeast. Notably, the lack of sulfate assimilation genes in the S. fibuligera genome reflects a unique phenotype for Saccharomycopsis clades as natural sulfur auxotrophs. Extended gene families, including novel genes involved in saccharification and proteolysis, were identified. Moreover, comparative genome analysis of S. fibuligera ATCC 36309, an isolate from chalky rye bread in Germany, revealed that an interchromosomal translocation occurred in the KPH12 genome before the generation of the KJJ81 hybrid genome.The completely sequenced S. fibuligera genome with high-quality annotation and RNA-Seq analysis establishes an important foundation for functional inference of S. fibuligera in the degradation of fermentation mash. The gene inventory facilitates the discovery of new genes applicable to the production of novel valuable enzymes and chemicals. Moreover, as the first gapless genome assembly in the genus Saccharomycopsis including members with desirable traits for bioconversion, the unique genomic features of S. fibuligera and its hybrid will provide in-depth insights into fungal genome dynamics as evolutionary adaptation.

July 7, 2019

High-quality draft genome sequence of the actinobacterium Nocardia terpenica IFM 0406, producer of the immunosuppressant brasilicardins, using Illumina and PacBio technologies.

The bacterium Nocardia terpenica IFM 0406 is known as the producer of the immunosuppressant brasilicardin A. Here, we report the completely sequenced genome of strain IFM 0406, which facilitates the heterologous expression of the brasilicardin biosynthetic gene cluster but also unveils the intriguing biosynthetic capacity of the strain to produce secondary metabolites. Copyright © 2016 Buchmann et al.

July 7, 2019

Complete genome sequence of the barley pathogen Xanthomonas translucens pv. translucens DSM 18974T (ATCC 19319T).

We report here the complete 4.7-Mb genome sequence of Xanthomonas translucens pv. translucens DSM 18974(T), which causes black chaff disease on barley (Hordeum vulgare). Genome data of this X. translucens type strain will improve our understanding of this bacterial species. Copyright © 2016 Jaenicke et al.

July 7, 2019

Complete genome sequences of six Legionella pneumophila isolates from two collocated outbreaks of Legionnaires’ disease in 2005 and 2008 in Sarpsborg/Fredrikstad, Norway.

Here, we report the complete genome sequences of Legionella pneumophila isolates from two collocated outbreaks of Legionnaires’ disease in 2005 and 2008 in Sarpsborg/Fredrikstad, Norway. One clinical and two environmental isolates were sequenced from each outbreak. The genome of all six isolates consisted of a 3.36 Mb-chromosome, while the 2005 genomes featured an additional 68 kb-episome sharing high sequence similarity with the L. pneumophila Lens plasmid. All six genomes contained multiple mobile genetic elements including novel combinations of type-IVA secretion systems. A comparative genomics study will be launched to resolve the genetic relationship between the L. pneumophila isolates. Copyright © 2016 Dybwad et al.

July 7, 2019

Complete genome anatomy of the emerging potato pathogen Dickeya solani type strain IPO 2222(T).

Several species of the genus Dickeya provoke soft rot and blackleg diseases on a wide range of plants and crops. Dickeya solani has been identified as the causative agent of diseases outbreaks on potato culture in Europe for the last decade. Here, we report the complete genome of the D. solani IPO 2222(T). Using PacBio and Illumina technologies, a unique circular chromosome of 4,919,833 bp was assembled. The G?+?C content reaches 56% and the genomic sequence contains 4,059 predicted proteins. The ANI values calculated for D. solani IPO 2222(T) vs. other available D. solani genomes was over 99.9% indicating a high genetic homogeneity within D. solani species.

July 7, 2019

The draft genome of whitefly Bemisia tabaci MEAM1, a global crop pest, provides novel insights into virus transmission, host adaptation, and insecticide resistance.

The whitefly Bemisia tabaci (Hemiptera: Aleyrodidae) is among the 100 worst invasive species in the world. As one of the most important crop pests and virus vectors, B. tabaci causes substantial crop losses and poses a serious threat to global food security. We report the 615-Mb high-quality genome sequence of B. tabaci Middle East-Asia Minor 1 (MEAM1), the first genome sequence in the Aleyrodidae family, which contains 15,664 protein-coding genes. The B. tabaci genome is highly divergent from other sequenced hemipteran genomes, sharing no detectable synteny. A number of known detoxification gene families, including cytochrome P450s and UDP-glucuronosyltransferases, are significantly expanded in B. tabaci. Other expanded gene families, including cathepsins, large clusters of tandemly duplicated B. tabaci-specific genes, and phosphatidylethanolamine-binding proteins (PEBPs), were found to be associated with virus acquisition and transmission and/or insecticide resistance, likely contributing to the global invasiveness and efficient virus transmission capacity of B. tabaci. The presence of 142 horizontally transferred genes from bacteria or fungi in the B. tabaci genome, including genes encoding hopanoid/sterol synthesis and xenobiotic detoxification enzymes that are not present in other insects, offers novel insights into the unique biological adaptations of this insect such as polyphagy and insecticide resistance. Interestingly, two adjacent bacterial pantothenate biosynthesis genes, panB and panC, have been co-transferred into B. tabaci and fused into a single gene that has acquired introns during its evolution.The B. tabaci genome contains numerous genetic novelties, including expansions in gene families associated with insecticide resistance, detoxification and virus transmission, as well as numerous horizontally transferred genes from bacteria and fungi. We believe these novelties likely have shaped B. tabaci as a highly invasive polyphagous crop pest and efficient vector of plant viruses. The genome serves as a reference for resolving the B. tabaci cryptic species complex, understanding fundamental biological novelties, and providing valuable genetic information to assist the development of novel strategies for controlling whiteflies and the viruses they transmit.

July 7, 2019

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.

July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Jabba: hybrid error correction for long sequencing reads.

Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.

July 7, 2019

Information-optimal genome assembly via sparse read-overlap graphs.

In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence?Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50.Available at github.com/samhykim/nsgcourtade@eecs.berkeley.edu or dntse@stanford.eduSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Systems biology-guided biodesign of consolidated lignin conversion

Lignin is the second most abundant biopolymer on the earth, yet its utilization for fungible products is complicated by its recalcitrant nature and remains a major challenge for sustainable lignocellulosic biorefineries. In this study, we used a systems biology approach to reveal the carbon utilization pattern and lignin degradation mechanisms in a unique lignin-utilizing Pseudomonas putida strain (A514). The mechanistic study further guided the design of three functional modules to enable a consolidated lignin bioconversion route. First, P. putida A514 mobilized a dye peroxidase-based enzymatic system for lignin depolymerization. This system could be enhanced by overexpressing a secreted multifunctional dye peroxidase to promote a two-fold enhancement of cell growth on insoluble kraft lignin. Second, A514 employed a variety of peripheral and central catabolism pathways to metabolize aromatic compounds, which can be optimized by overexpressing key enzymes. Third, the ß-oxidation of fatty acid was up-regulated, whereas fatty acid synthesis was down-regulated when A514 was grown on lignin and vanillic acid. Therefore, the functional module for polyhydroxyalkanoate (PHA) production was designed to rechannel ß-oxidation products. As a result, PHA content reached 73% per cell dry weight (CDW). Further integrating the three functional modules enhanced the production of PHA from kraft lignin and biorefinery waste. Thus, this study elucidated lignin conversion mechanisms in bacteria with potential industrial implications and laid out the concept for engineering a consolidated lignin conversion route.

Auto Tag: Hybrid assembly

Assembly and transfer of tripartite integrative and conjugative genetic elements.

Capturing pairwise and multi-way chromosomal conformations using chromosomal walks.

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Decay of sexual trait genes in an asexual parasitoid wasp.

Whole-genome de novo sequencing, combined with RNA-Seq analysis, reveals unique genome and physiological features of the amylolytic yeast Saccharomycopsis fibuligera and its interspecies hybrid.

High-quality draft genome sequence of the actinobacterium Nocardia terpenica IFM 0406, producer of the immunosuppressant brasilicardins, using Illumina and PacBio technologies.

Complete genome sequence of the barley pathogen Xanthomonas translucens pv. translucens DSM 18974T (ATCC 19319T).

Complete genome sequences of six Legionella pneumophila isolates from two collocated outbreaks of Legionnaires’ disease in 2005 and 2008 in Sarpsborg/Fredrikstad, Norway.

Complete genome anatomy of the emerging potato pathogen Dickeya solani type strain IPO 2222(T).

The draft genome of whitefly Bemisia tabaci MEAM1, a global crop pest, provides novel insights into virus transmission, host adaptation, and insecticide resistance.

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

Improved assembly of noisy long reads by k-mer validation.

Jabba: hybrid error correction for long sequencing reads.

Information-optimal genome assembly via sparse read-overlap graphs.

Systems biology-guided biodesign of consolidated lignin conversion

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert