Hybrid assembly Archives - Page 10 of 49

September 22, 2019

RAD sequencing and a hybrid Antarctic fur seal genome assembly reveal rapidly decaying linkage disequilibrium, global population structure and evidence for inbreeding.

Recent advances in high throughput sequencing have transformed the study of wild organisms by facilitating the generation of high quality genome assemblies and dense genetic marker datasets. These resources have the potential to significantly advance our understanding of diverse phenomena at the level of species, populations and individuals, ranging from patterns of synteny through rates of linkage disequilibrium (LD) decay and population structure to individual inbreeding. Consequently, we used PacBio sequencing to refine an existing Antarctic fur seal (Arctocephalus gazella) genome assembly and genotyped 83 individuals from six populations using restriction site associated DNA (RAD) sequencing. The resulting hybrid genome comprised 6,169 scaffolds with an N50 of 6.21 Mb and provided clear evidence for the conservation of large chromosomal segments between the fur seal and dog (Canis lupus familiaris). Focusing on the most extensively sampled population of South Georgia, we found that LD decayed rapidly, reaching the background level by around 400 kb, consistent with other vertebrates but at odds with the notion that fur seals experienced a strong historical bottleneck. We also found evidence for population structuring, with four main Antarctic island groups being resolved. Finally, appreciable variance in individual inbreeding could be detected, reflecting the strong polygyny and site fidelity of the species. Overall, our study contributes important resources for future genomic studies of fur seals and other pinnipeds while also providing a clear example of how high throughput sequencing can generate diverse biological insights at multiple levels of organization. Copyright © 2018 Humble et al.

September 22, 2019

Investigating the central metabolism of Clostridium thermosuccinogenes.

Clostridium thermosuccinogenes is a thermophilic anaerobic bacterium able to convert various carbohydrates to succinate and acetate as main fermentation products. Genomes of the four publicly available strains have been sequenced, and the genome of the type strain has been closed. The annotated genomes were used to reconstruct the central metabolism, and enzyme assays were used to validate annotations and to determine cofactor specificity. The genes were identified for the pathways to all fermentation products, as well as for the Embden-Meyerhof-Parnas pathway and the pentose phosphate pathway. Notably, a candidate transaldolase was lacking, and transcriptomics during growth on glucose versus that on xylose did not provide any leads to potential transaldolase genes or alternative pathways connecting the C5 with the C3/C6 metabolism. Enzyme assays showed xylulokinase to prefer GTP over ATP, which could be of importance for engineering xylose utilization in related thermophilic species of industrial relevance. Furthermore, the gene responsible for malate dehydrogenase was identified via heterologous expression in Escherichia coli and subsequent assays with the cell extract, which has proven to be a simple and powerful method for the basal characterization of thermophilic enzymes.IMPORTANCE Running industrial fermentation processes at elevated temperatures has several advantages, including reduced cooling requirements, increased reaction rates and solubilities, and a possibility to perform simultaneous saccharification and fermentation of a pretreated biomass. Most studies with thermophiles so far have focused on bioethanol production. Clostridium thermosuccinogenes seems an attractive production organism for organic acids, succinic acid in particular, from lignocellulosic biomass-derived sugars. This study provides valuable insights into its central metabolism and GTP and PPi cofactor utilization. Copyright © 2018 American Society for Microbiology.

September 22, 2019

High-quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant.

Salvia splendens Ker-Gawler, scarlet or tropical sage, is a tender herbaceous perennial widely introduced and seen in public gardens all over the world. With few molecular resources, breeding is still restricted to traditional phenotypic selection, and the genetic mechanisms underlying phenotypic variation remain unknown. Hence, a high-quality reference genome will be very valuable for marker-assisted breeding, genome editing, and molecular genetics.We generated 66 Gb and 37 Gb of raw DNA sequences, respectively, from whole-genome sequencing of a largely homozygous scarlet sage inbred line using Pacific Biosciences (PacBio) single-molecule real-time and Illumina HiSeq sequencing platforms. The PacBio de novo assembly yielded a final genome with a scaffold N50 size of 3.12 Mb and a total length of 808 Mb. The repetitive sequences identified accounted for 57.52% of the genome sequence, and ?54,008 protein-coding genes were predicted collectively with ab initio and homology-based gene prediction from the masked genome. The divergence time between S. splendens and Salvia miltiorrhiza was estimated at 28.21 million years ago (Mya). Moreover, 3,797 species-specific genes and 1,187 expanded gene families were identified for the scarlet sage genome.We provide the first genome sequence and gene annotation for the scarlet sage. The availability of these resources will be of great importance for further breeding strategies, genome editing, and comparative genomics among related species.

September 22, 2019

A graph-based approach to diploid genome assembly.

Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.https://github.com/whatshap/whatshap.Supplementary data are available at Bioinformatics online.

September 22, 2019

Comparative analysis reveals unexpected genome features of newly isolated Thraustochytrids strains: on ecological function and PUFAs biosynthesis.

Thraustochytrids are unicellular fungal-like marine protists with ubiquitous existence in marine environments. They are well-known for their ability to produce high-valued omega-3 polyunsaturated fatty acids (?-3-PUFAs) (e.g., docosahexaenoic acid (DHA)) and hydrolytic enzymes. Thraustochytrid biomass has been estimated to surpass that of bacterioplankton in both coastal and oceanic waters indicating they have an important role in microbial food-web. Nevertheless, the molecular pathway and regulatory network for PUFAs production and the molecular mechanisms underlying ecological functions of thraustochytrids remain largely unknown.The genomes of two thraustochytrids strains (Mn4 and SW8) with ability to produce DHA were sequenced and assembled with a hybrid sequencing approach utilizing Illumina short paired-end reads and Pacific Biosciences long reads to generate a highly accurate genome assembly. Phylogenomic and comparative genomic analyses found that DHA-producing thraustochytrid strains were highly similar and possessed similar gene content. Analysis of the conventional fatty acid synthesis (FAS) and the polyketide synthase (PKS) systems for PUFAs production only detected incomplete and fragmentary pathways in the genome of these two strains. Surprisingly, secreted carbohydrate active enzymes (CAZymes) were found to be significantly depleted in the genomes of these 2 strains as compared to other sequenced relatives. Furthermore, these two strains possess an expanded gene repertoire for signal transduction and self-propelled movement, which could be important for their adaptations to dynamic marine environments.Our results demonstrate the possibility of a third PUFAs synthesis pathway besides previously described FAS and PKS pathways encoded in the genome of these two thraustochytrid strains. Moreover, lack of a complete set of hydrolytic enzymatic machinery for degrading plant-derived organic materials suggests that these two DHA-producing strains play an important role as a nutritional source rather than a nutrient-producer in marine microbial-food web. Results of this study suggest the existence of two types of saprobic thraustochytrids in the world’s ocean. The first group, which does not produce cellulosic enzymes and live as ‘left-over’ scavenger of bacterioplankton, serves as a dietary source for the plankton of higher trophic levels and the other possesses capacity to live on detrital organic matters in the marine ecosystems.

September 22, 2019

Genomic variation among and within six Juglans species.

Genomic analysis in Juglans (walnuts) is expected to transform the breeding and agricultural production of both nuts and lumber. To that end, we report here the determination of reference sequences for six additional relatives of Juglans regia: Juglans sigillata (also from section Dioscaryon), Juglans nigra, Juglans microcarpa, Juglans hindsii (from section Rhysocaryon), Juglans cathayensis (from section Cardiocaryon), and the closely related Pterocarya stenoptera While these are ‘draft’ genomes, ranging in size between 640Mbp and 990Mbp, their contiguities and accuracies can support powerful annotations of genomic variation that are often the foundation of new avenues of research and breeding. We annotated nucleotide divergence and synteny by creating complete pairwise alignments of each reference genome to the remaining six. In addition, we have re-sequenced a sample of accessions from four Juglans species (including regia). The variation discovered in these surveys comprises a critical resource for experimentation and breeding, as well as a solid complementary annotation. To demonstrate the potential of these resources the structural and sequence variation in and around the polyphenol oxidase loci, PPO1 and PPO2 were investigated. As reported for other seed crops variation in this gene is implicated in the domestication of walnuts. The apparently Juglandaceae specific PPO1 duplicate shows accelerated divergence and an excess of amino acid replacement on the lineage leading to accessions of the domesticated nut crop species, Juglans regia and sigillata. Copyright © 2018 Stevens et al.

September 22, 2019

Genome survey of the freshwater mussel Venustaconcha ellipsiformis (Bivalvia: Unionida) using a hybrid de novo assembly approach.

Freshwater mussels (Bivalvia: Unionida) serve an important role as aquatic ecosystem engineers but are one of the most critically imperilled groups of animals. Here, we used a combination of sequencing strategies to assemble and annotate a draft genome of Venustaconcha ellipsiformis, which will serve as a valuable genomic resource given the ecological value and unique “doubly uniparental inheritance” mode of mitochondrial DNA transmission of freshwater mussels. The genome described here was obtained by combining high-coverage short reads (65× genome coverage of Illumina paired-end and 11× genome coverage of mate-pairs sequences) with low-coverage Pacific Biosciences long reads (0.3× genome coverage). Briefly, the final scaffold assembly accounted for a total size of 1.54?Gb (366,926 scaffolds, N50?=?6.5 kb, with 2.3% of “N” nucleotides), representing 86% of the predicted genome size of 1.80?Gb, while over one third of the genome (37.5%) consisted of repeated elements and >85% of the core eukaryotic genes were recovered. Given the repeated genetic bottlenecks of V. ellipsiformis populations as a result of glaciations events, heterozygosity was also found to be remarkably low (0.6%), in contrast to most other sequenced bivalve species. Finally, we reassembled the full mitochondrial genome and found six polymorphic sites with respect to the previously published reference. This resource opens the way to comparative genomics studies to identify genes related to the unique adaptations of freshwater mussels and their distinctive mitochondrial inheritance mechanism.

September 22, 2019

Nine draft genome sequences of Claviceps purpurea s.lat., including C. arundinis, C. humidiphila, and C. cf. spartinae, pseudomolecules for the pitch canker pathogen Fusarium circinatum, draft genome of Davidsoniella eucalypti, Grosmannia galeiformis, Quambalaria eucalypti, and Teratosphaeria destructans.

This genome announcement includes draft genomes from Claviceps purpurea s.lat., including C. arundinis, C. humidiphila and C. cf. spartinae. The draft genomes of Davidsoniella eucalypti, Quambalaria eucalypti and Teratosphaeria destructans, all three important eucalyptus pathogens, are presented. The insect associate Grosmannia galeiformis is also described. The pine pathogen genome of Fusarium circinatum has been assembled into pseudomolecules, based on additional sequence data and by harnessing the known synteny within the Fusarium fujikuroi species complex. This new assembly of the F. circinatum genome provides 12 pseudomolecules that correspond to the haploid chromosome number of F. circinatum. These are comparable to other chromosomal assemblies within the FFSC and will enable more robust genomic comparisons within this species complex.

September 22, 2019

Potential survival and pathogenesis of a novel strain, Vibrio parahaemolyticus FORC_022, isolated from a soy sauce marinated crab by genome and transcriptome analyses.

Vibrio parahaemolyticus can cause gastrointestinal illness through consumption of seafood. Despite frequent food-borne outbreaks of V. parahaemolyticus, only 19 strains have subjected to complete whole-genome analysis. In this study, a novel strain of V. parahaemolyticus, designated FORC_022 (Food-borne pathogen Omics Research Center_022), was isolated from soy sauce marinated crabs, and its genome and transcriptome were analyzed to elucidate the pathogenic mechanisms. FORC_022 did not include major virulence factors of thermostable direct hemolysin (tdh) and TDH-related hemolysin (trh). However, FORC_022 showed high cytotoxicity and had several V. parahaemolyticus islands (VPaIs) and other virulence factors, such as various secretion systems (types I, II, III, IV, and VI), in comparative genome analysis with CDC_K4557 (the most similar strain) and RIMD2210633 (genome island marker strain). FORC_022 harbored additional virulence genes, including accessory cholera enterotoxin, zona occludens toxin, and tight adhesion (tad) locus, compared with CDC_K4557. In addition, O3 serotype specific gene and the marker gene of pandemic O3:K6 serotype (toxRS) were detected in FORC_022. The expressions levels of genes involved in adherence and carbohydrate transporter were high, whereas those of genes involved in motility, arginine biosynthesis, and proline metabolism were low after exposure to crabs. Moreover, the virulence factors of the type III secretion system, tad locus, and thermolabile hemolysin were overexpressed. Therefore, the risk of foodborne-illness may be high following consumption of FORC_022 contaminated crab. These results provided molecular information regarding the survival and pathogenesis of V. parahaemolyticus FORC_022 strain in contaminated crab and may have applications in food safety.

September 22, 2019

A reference genome of the Chinese hamster based on a hybrid assembly strategy.

Accurate and complete genome sequences are essential in biotechnology to facilitate genome-based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short-read-based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina-based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28-fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up- and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research.© 2018 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals, Inc.

September 22, 2019

Optical and physical mapping with local finishing enables megabase-scale resolution of agronomically important regions in the wheat genome.

Numerous scaffold-level sequences for wheat are now being released and, in this context, we report on a strategy for improving the overall assembly to a level comparable to that of the human genome.Using chromosome 7A of wheat as a model, sequence-finished megabase-scale sections of this chromosome were established by combining a new independent assembly using a bacterial artificial chromosome (BAC)-based physical map, BAC pool paired-end sequencing, chromosome-arm-specific mate-pair sequencing and Bionano optical mapping with the International Wheat Genome Sequencing Consortium RefSeq v1.0 sequence and its underlying raw data. The combined assembly results in 18 super-scaffolds across the chromosome. The value of finished genome regions is demonstrated for two approximately 2.5 Mb regions associated with yield and the grain quality phenotype of fructan carbohydrate grain levels. In addition, the 50 Mb centromere region analysis incorporates cytological data highlighting the importance of non-sequence data in the assembly of this complex genome region.Sufficient genome sequence information is shown to now be available for the wheat community to produce sequence-finished releases of each chromosome of the reference genome. The high-level completion identified that an array of seven fructosyl transferase genes underpins grain quality and that yield attributes are affected by five F-box-only-protein-ubiquitin ligase domain and four root-specific lipid transfer domain genes. The completed sequence also includes the centromere.

September 22, 2019

Complete genome of streamlined marine actinobacterium Pontimonas salivibrio strain CL-TW6T adapted to coastal planktonic lifestyle.

Pontimonas salivibrio strain CL-TW6T (=KCCM 90105?=?JCM18206) was characterized as the type strain of a new genus within the Actinobacterial family Microbacteriaceae. It was isolated from a coastal marine environment in which members of Microbactericeae have not been previously characterized.The genome of P. salivibrio CL-TW6T was a single chromosome of 1,760,810 bp. Genomes of this small size are typically found in bacteria growing slowly in oligotrophic zones and said to be streamlined. Phylogenetic analysis showed it to represent a lineage originating in the Microbacteriaceae radiation occurring before the snowball Earth glaciations, and to have a closer relationship with some streamlined bacteria known through metagenomic data. Several genomic characteristics typical of streamlined bacteria are found: %G?+?C is lower than non-streamlined members of the phylum; there are a minimal number of rRNA and tRNA genes, fewer paralogs in most gene families, and only two sigma factors; there is a noticeable absence of some nonessential metabolic pathways, including polyketide synthesis and catabolism of some amino acids. There was no indication of any phage genes or plasmids, however, a system of active insertion elements was present. P. salivibrio appears to be unusual in having polyrhamnose-based cell wall oligosaccharides instead of mycolic acid or teichoic acid-based oligosaccharides. Oddly, it conducts sulfate assimilation apparently for sulfating cell wall components, but not for synthesizing amino acids. One gene family it has more of, rather than fewer of, are toxin/antitoxin systems, which are thought to down-regulate growth during nutrient deprivation or other stressful conditions.Because of the relatively small number of paralogs and its relationship to the heavily characterized Mycobacterium tuberculosis, we were able to heavily annotate the genome of P. salivibrio CL-TW6T. Its streamlined status and relationship to streamlined metagenomic constructs makes it an important reference genome for study of the streamlining concept. The final evolutionary trajectory of CL-TW6 T was to adapt to growth in a non-oligotrophic coastal zone. To understand that adaptive process, we give a thorough accounting of gene content, contrasting with both oligotrophic streamlined bacteria and large genome bacteria, and distinguishing between genes derived by vertical and horizontal descent.

September 22, 2019

First draft genome assembly of the Argane tree (Argania spinosa)

Background: The Argane tree (Argania spinosa L. Skeels) is an endemic tree of southwestern Morocco that plays an important socioeconomic and ecologic role for a dense human population in an arid zone. Several studies confirmed the importance of this species as a food and feed source and as a resource for both pharmaceutical and cosmetic compounds. Unfortunately, the argane tree ecosystem is facing significant threats from environmental changes (global warming, over-population) and over-exploitation. Limited research has been conducted, however, on argane tree genetics and genomics, which hinders its conservation and genetic improvement. Methods: Here, we present a draft genome assembly of A. spinosa. A reliable reference genome of A. spinosa was created using a hybrid de novo assembly approach combining short and long sequencing reads. Results: In total, 144 Gb Illumina HiSeq reads and 7.2 Gb PacBio reads were produced and assembled. The final draft genome comprises 75 327 scaffolds totaling 671 Mb with an N50 of 49 916 kb. The draft assembly is close to the genome size estimated by k-mers distribution and covers 89% of complete and 4.3 % of partial Arabidopsis orthologous groups in BUSCO. Conclusion: The A. spinosa genome will be useful for assessing biodiversity leading to efficient conservation of this endangered endemic tree. Furthermore, the genome may enable genome-assisted cultivar breeding, and provide a better understanding of important metabolic pathways and their underlying genes for both cosmetic and pharmacological purposes.

September 22, 2019

Variant O89 O-antigen of E. coli is associated with group 1 capsule loci and multidrug resistance.

Bacterial surface polysaccharides play significant roles in fitness and virulence. In Gram-negative bacteria such as Escherichia coli, major surface polysaccharides are lipopolysaccharide (LPS) and capsule, representing O- and K-antigens, respectively. There are multiple combinations of O:K types, many of which are well-characterized and can be related to ecotype or pathotype. In this investigation, we have identified a novel O:K permutation resulting through a process of major genome reorganization in a clade of E. coli. A multidrug-resistant, extended-spectrum ß-lactamase (ESBL)-producing strain – E. coli 26561 – represented a prototype of strains combining a locus variant of O89 and group 1 capsular polysaccharide. Specifically, the variant O89 locus in this strain was truncated at gnd, flanked by insertion sequences and located between nfsB and ybdK and we apply the term O89m for this variant. The prototype lacked colanic acid and O-antigen loci between yegH and hisI with this tandem polysaccharide locus being replaced with a group 1 capsule (G1C) which, rather than being a recognized E. coli capsule type, this locus matched to Klebsiella K10 capsule type. A genomic survey identified more than 200 E. coli strains which possessed the O89m locus variant with one of a variety of G1C types. Isolates from our collection with the combination of O89m and G1C all displayed a mucoid phenotype and E. coli 26561 was unusual in exhibiting a mucoviscous phenotype more recognized as a characteristic among Klebsiella strains. Despite the locus truncation and novel location, all O89m:G1C strains examined showed a ladder pattern typifying smooth LPS and also showed high molecular weight, alcian blue-staining polysaccharide in cellular and/or extra-cellular fractions. Expression of both O-antigen and capsule biosynthesis loci were confirmed in prototype strain 26561 through quantitative proteome analysis. Further in silico exploration of more than 200 E. coli strains possessing the O89m:G1C combination identified a very high prevalence of multidrug resistance (MDR) – 85% possessed resistance to three or more antibiotic classes and a high proportion (58%) of these carried ESBL and/or carbapenemase. The increasing isolation of O89m:G1C isolates from extra-intestinal infection sites suggests that these represents an emergent clade of invasive, MDR E. coli.

September 22, 2019

Draft genome assembly of the invasive cane toad, Rhinella marina.

The cane toad (Rhinella marina formerly Bufo marinus) is a species native to Central and South America that has spread across many regions of the globe. Cane toads are known for their rapid adaptation and deleterious impacts on native fauna in invaded regions. However, despite an iconic status, there are major gaps in our understanding of cane toad genetics. The availability of a genome would help to close these gaps and accelerate cane toad research.We report a draft genome assembly for R. marina, the first of its kind for the Bufonidae family. We used a combination of long-read Pacific Biosciences RS II and short-read Illumina HiSeq X sequencing to generate 359.5 Gb of raw sequence data. The final hybrid assembly of 31,392 scaffolds was 2.55 Gb in length with a scaffold N50 of 168 kb. BUSCO analysis revealed that the assembly included full length or partial fragments of 90.6% of tetrapod universal single-copy orthologs (n = 3950), illustrating that the gene-containing regions have been well assembled. Annotation predicted 25,846 protein coding genes with similarity to known proteins in Swiss-Prot. Repeat sequences were estimated to account for 63.9% of the assembly.The R. marina draft genome assembly will be an invaluable resource that can be used to further probe the biology of this invasive species. Future analysis of the genome will provide insights into cane toad evolution and enrich our understanding of their interplay with the ecosystem at large.

Auto Tag: Hybrid assembly

RAD sequencing and a hybrid Antarctic fur seal genome assembly reveal rapidly decaying linkage disequilibrium, global population structure and evidence for inbreeding.

Investigating the central metabolism of Clostridium thermosuccinogenes.

High-quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant.

A graph-based approach to diploid genome assembly.

Comparative analysis reveals unexpected genome features of newly isolated Thraustochytrids strains: on ecological function and PUFAs biosynthesis.

Genomic variation among and within six Juglans species.

Genome survey of the freshwater mussel Venustaconcha ellipsiformis (Bivalvia: Unionida) using a hybrid de novo assembly approach.

Potential survival and pathogenesis of a novel strain, Vibrio parahaemolyticus FORC_022, isolated from a soy sauce marinated crab by genome and transcriptome analyses.

A reference genome of the Chinese hamster based on a hybrid assembly strategy.

Optical and physical mapping with local finishing enables megabase-scale resolution of agronomically important regions in the wheat genome.

Complete genome of streamlined marine actinobacterium Pontimonas salivibrio strain CL-TW6T adapted to coastal planktonic lifestyle.

First draft genome assembly of the Argane tree (Argania spinosa)

Variant O89 O-antigen of E. coli is associated with group 1 capsule loci and multidrug resistance.

Draft genome assembly of the invasive cane toad, Rhinella marina.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert