Long-read assembly Archives - Page 13 of 15

September 22, 2019

The opium poppy genome and morphinan production.

Morphinan-based painkillers are derived from opium poppy (Papaver somniferum L.). We report a draft of the opium poppy genome, with 2.72 gigabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 1.77 and 204 megabases, respectively. Synteny analysis suggests a whole-genome duplication at ~7.8 million years ago and ancient segmental or whole-genome duplication(s) that occurred before the Papaveraceae-Ranunculaceae divergence 110 million years ago. Syntenic blocks representative of phthalideisoquinoline and morphinan components of a benzylisoquinoline alkaloid cluster of 15 genes provide insight into how this cluster evolved. Paralog analysis identified P450 and oxidoreductase genes that combined to form the STORR gene fusion essential for morphinan biosynthesis in opium poppy. Thus, gene duplication, rearrangement, and fusion events have led to evolution of specialized metabolic products in opium poppy. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

September 22, 2019

Assembling the genome of the African wild rice Oryza longistaminata by exploiting synteny in closely related Oryza species.

The African wild rice species Oryza longistaminata has several beneficial traits compared to cultivated rice species, such as resistance to biotic stresses, clonal propagation via rhizomes, and increased biomass production. To facilitate breeding efforts and functional genomics studies, we de-novo assembled a high-quality, haploid-phased genome. Here, we present our assembly, with a total length of 351?Mb, of which 92.2% was anchored onto 12 chromosomes. We detected 34,389 genes and 38.1% of the genome consisted of repetitive content. We validated our assembly by a comparative linkage analysis and by examining well-characterized gene families. This genome assembly will be a useful resource to exploit beneficial alleles found in O. longistaminata. Our results also show that it is possible to generate a high-quality, functionally complete rice genome assembly from moderate SMRT read coverage by exploiting synteny in a closely related Oryza species.

September 22, 2019

The genomic basis of color pattern polymorphism in the Harlequin ladybird.

Many animal species comprise discrete phenotypic forms. A common example in natural populations of insects is the occurrence of different color patterns, which has motivated a rich body of ecological and genetic research [1-6]. The occurrence of dark, i.e., melanic, forms displaying discrete color patterns is found across multiple taxa, but the underlying genomic basis remains poorly characterized. In numerous ladybird species (Coccinellidae), the spatial arrangement of black and red patches on adult elytra varies wildly within species, forming strikingly different complex color patterns [7, 8]. In the harlequin ladybird, Harmonia axyridis, more than 200 distinct color forms have been described, which classic genetic studies suggest result from allelic variation at a single, unknown, locus [9, 10]. Here, we combined whole-genome sequencing, population-based genome-wide association studies, gene expression, and functional analyses to establish that the transcription factor Pannier controls melanic pattern polymorphism in H. axyridis. We show that pannier is necessary for the formation of melanic elements on the elytra. Allelic variation in pannier leads to protein expression in distinct domains on the elytra and thus determines the distinct color patterns in H. axyridis. Recombination between pannier alleles may be reduced by a highly divergent sequence of ~170 kb in the cis-regulatory regions of pannier, with a 50 kb inversion between color forms. This most likely helps maintain the distinct alleles found in natural populations. Thus, we propose that highly variable discrete color forms can arise in natural populations through cis-regulatory allelic variation of a single gene. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

September 22, 2019

Characterization of the lytic bacteriophage phiEaP-8 effective against both Erwinia amylovora and Erwinia pyrifoliae causing severe diseases in apple and pear.

Bacteriophages, bacteria-infecting viruses, have been recently reconsidered as a biological control tool for preventing bacterial pathogens. Erwinia amylovora and E. pyrifoliae cause fire blight and black shoot blight disease in apple and pear, respectively. In this study, the bacteriophage phiEaP-8 was isolated from apple orchard soil and could efficiently and specifically kill both E. amylovora and E. pyrifoliae. This bacteriophage belongs to the Podoviridae family. Whole genome analysis revealed that phiEaP-8 carries a 75,929 bp genomic DNA with 78 coding sequences and 5 tRNA genes. Genome comparison showed that phiEaP-8 has only 85% identity to known bacteriophages at the DNA level. PhiEaP-8 retained lytic activity up to 50°C, within a pH range from 5 to 10, and under 365 nm UV light. Based on these characteristics, the bacteriophage phiEaP-8 is novel and carries potential to control both E. amylovora and E. pyrifoliae in apple and pear.

September 22, 2019

Repeat elements organise 3D genome structure and mediate transcription in the filamentous fungus Epichloë festucae.

Structural features of genomes, including the three-dimensional arrangement of DNA in the nucleus, are increasingly seen as key contributors to the regulation of gene expression. However, studies on how genome structure and nuclear organisation influence transcription have so far been limited to a handful of model species. This narrow focus limits our ability to draw general conclusions about the ways in which three-dimensional structures are encoded, and to integrate information from three-dimensional data to address a broader gamut of biological questions. Here, we generate a complete and gapless genome sequence for the filamentous fungus, Epichloë festucae. We use Hi-C data to examine the three-dimensional organisation of the genome, and RNA-seq data to investigate how Epichloë genome structure contributes to the suite of transcriptional changes needed to maintain symbiotic relationships with the grass host. Our results reveal a genome in which very repeat-rich blocks of DNA with discrete boundaries are interspersed by gene-rich sequences that are almost repeat-free. In contrast to other species reported to date, the three-dimensional structure of the genome is anchored by these repeat blocks, which act to isolate transcription in neighbouring gene-rich regions. Genes that are differentially expressed in planta are enriched near the boundaries of these repeat-rich blocks, suggesting that their three-dimensional orientation partly encodes and regulates the symbiotic relationship formed by this organism.

September 22, 2019

Complete genome sequence and characterization of linezolid-resistant Enterococcus faecalis clinical isolate KUB3006 carrying a cfr(B)-transposon on its chromosome and optrA-plasmid.

Linezolid (LZD) has become one of the most important antimicrobial agents for infections caused by gram-positive bacteria, including those caused by Enterococcus species. LZD-resistant (LR) genetic features include mutations in 23S rRNA/ribosomal proteins, a plasmid-borne 23S rRNA methyltransferase gene cfr, and ribosomal protection genes (optrA and poxtA). Recently, a cfr gene variant, cfr(B), was identified in a Tn6218-like transposon (Tn) in a Clostridioides difficile isolate. Here, we isolated an LR Enterococcus faecalis clinical isolate, KUB3006, from a urine specimen of a patient with urinary tract infection during hospitalization in 2017. Comparative and whole-genome analyses were performed to characterize the genetic features and overall antimicrobial resistance genes in E. faecalis isolate KUB3006. Complete genome sequencing of KUB3006 revealed that it carried cfr(B) on a chromosomal Tn6218-like element. Surprisingly, this Tn6218-like element was almost (99%) identical to that of C. difficile Ox3196, which was isolated from a human in the UK in 2012, and to that of Enterococcus faecium 5_Efcm_HA-NL, which was isolated from a human in the Netherlands in 2012. An additional oxazolidinone and phenicol resistance gene, optrA, was also identified on a plasmid. KUB3006 is sequence type (ST) 729, suggesting that it is a minor ST that has not been reported previously and is unlikely to be a high-risk E. faecalis lineage. In summary, LR E. faecalis KUB3006 possesses a notable Tn6218-like-borne cfr(B) and a plasmid-borne optrA. This finding raises further concerns regarding the potential declining effectiveness of LZD treatment in the future.

September 22, 2019

A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content

Cannabis has been cultivated for millennia with distinct cultivars providing either fiber and grain or tetrahydrocannabinol. Recent demand for cannabidiol rather than tetrahydrocannabinol has favored the breeding of admixed cultivars with extremely high cannabidiol content. Despite several draft Cannabis genomes, the genomic structure of cannabinoid synthase loci has remained elusive. A genetic map derived from a tetrahydrocannabinol/cannabidiol segregating population and a complete chromosome assembly from a high-cannabidiol cultivar together resolve the linkage of cannabidiolic and tetrahydrocannabinolic acid synthase gene clusters which are associated with transposable elements. High-cannabidiol cultivars appear to have been generated by integrating hemp-type cannabidiolic acid synthase gene clusters into a background of marijuana-type cannabis. Quantitative trait locus mapping suggests that overall drug potency, however, is associated with other genomic regions needing additional study.

September 22, 2019

A continuous genome assembly of the corkwing wrasse (Symphodus melops).

The wrasses (Labridae) are one of the most successful and species-rich families of the Perciformes order of teleost fish. Its members display great morphological diversity, and occupy distinct trophic levels in coastal waters and coral reefs. The cleaning behaviour displayed by some wrasses, such as corkwing wrasse (Symphodus melops), is of particular interest for the salmon aquaculture industry to combat and control sea lice infestation as an alternative to chemicals and pharmaceuticals. There are still few genome assemblies available within this fish family for comparative and functional studies, despite the rapid increase in genome resources generated during the past years. Here, we present a highly continuous genome assembly of the corkwing wrasse using PacBio SMRT sequencing (x28.8) followed by error correction with paired-end Illumina data (x132.9). The present genome assembly consists of 5040 contigs (N50?=?461,652?bp) and a total size of 614 Mbp, of which 8.5% of the genome sequence encode known repeated elements. The genome assembly covers 94.21% of highly conserved genes across ray-finned fish species. We find evidence for increased copy numbers specific for corkwing wrasse possibly highlighting diversification and adaptive processes in gene families including N-linked glycosylation (ST8SIA6) and stress response kinases (HIPK1). By comparative analyses, we discover that de novo repeats, often not properly investigated during genome annotation, encode hundreds of immune-related genes. This new genomic resource, together with the ballan wrasse (Labrus bergylta), will allow for in-depth comparative genomics as well as population genetic analyses for the understudied wrasses. Copyright © 2018 Elsevier Inc. All rights reserved.

September 22, 2019

The unique evolution of the pig LRC, a single KIR but expansion of LILR and a novel Ig receptor family.

The leukocyte receptor complex (LRC) encodes numerous immunoglobulin (Ig)-like receptors involved in innate immunity. These include the killer-cell Ig-like receptors (KIR) and the leukocyte Ig-like receptors (LILR) which can be polymorphic and vary greatly in number between species. Using the recent long-read genome assembly, Sscrofa11.1, we have characterized the porcine LRC on chromosome 6. We identified a ~?197-kb region containing numerous LILR genes that were missing in previous assemblies. Out of 17 such LILR genes and fragments, six encode functional proteins, of which three are inhibitory and three are activating, while the majority of pseudogenes had the potential to encode activating receptors. Elsewhere in the LRC, between FCAR and GP6, we identified a novel gene that encodes two Ig-like domains and a long inhibitory intracellular tail. Comparison with two other porcine assemblies revealed a second, nearly identical, non-functional gene encoding a short intracellular tail with ambiguous function. These novel genes were found in a diverse range of mammalian species, including a pseudogene in humans, and typically consist of a single long-tailed receptor and a variable number of short-tailed receptors. Using porcine transcriptome data, both the novel inhibitory gene and the LILR were highly expressed in peripheral blood, while the single KIR gene, KIR2DL1, was either very poorly expressed or not at all. These observations are a prerequisite for improved understanding of immune cell functions in the pig and other species.

September 22, 2019

Computational tools to unmask transposable elements.

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.

September 22, 2019

The genomic architecture and molecular evolution of ant odorant receptors.

The massive expansions of odorant receptor (OR) genes in ant genomes are notable examples of rapid genome evolution and adaptive gene duplication. However, the molecular mechanisms leading to gene family expansion remain poorly understood, partly because available ant genomes are fragmentary. Here, we present a highly contiguous, chromosome-level assembly of the clonal raider ant genome, revealing the largest known OR repertoire in an insect. While most ant ORs originate via local tandem duplication, we also observe several cases of dispersed duplication followed by tandem duplication in the most rapidly evolving OR clades. We found that areas of unusually high transposable element density (TE islands) were depauperate in ORs in the clonal raider ant, and found no evidence for retrotransposition of ORs. However, OR loci were enriched for transposons relative to the genome as a whole, potentially facilitating tandem duplication by unequal crossing over. We also found that ant OR genes are highly AT-rich compared to other genes. In contrast, in flies, OR genes are dispersed and largely isolated within the genome, and we find that fly ORs are not AT-rich. The genomic architecture and composition of ant ORs thus show convergence with the unrelated vertebrate ORs rather than the related fly ORs. This might be related to the greater gene numbers and/or potential similarities in gene regulation between ants and vertebrates as compared to flies.© 2018 McKenzie and Kronauer; Published by Cold Spring Harbor Laboratory Press.

September 22, 2019

Comparative genomic and methylome analysis of non-virulent D74 and virulent Nagasaki Haemophilus parasuis isolates.

Haemophilus parasuis is a respiratory pathogen of swine and the etiological agent of Glässer’s disease. H. parasuis isolates can exhibit different virulence capabilities ranging from lethal systemic disease to subclinical carriage. To identify genomic differences between phenotypically distinct strains, we obtained the closed whole-genome sequence annotation and genome-wide methylation patterns for the highly virulent Nagasaki strain and for the non-virulent D74 strain. Evaluation of the virulence-associated genes contained within the genomes of D74 and Nagasaki led to the discovery of a large number of toxin-antitoxin (TA) systems within both genomes. Five predicted hemolysins were identified as unique to Nagasaki and seven putative contact-dependent growth inhibition toxin proteins were identified only in strain D74. Assessment of all potential vtaA genes revealed thirteen present in the Nagasaki genome and three in the D74 genome. Subsequent evaluation of the predicted protein structure revealed that none of the D74 VtaA proteins contain a collagen triple helix repeat domain. Additionally, the predicted protein sequence for two D74 VtaA proteins is substantially longer than any predicted Nagasaki VtaA proteins. Fifteen methylation sequence motifs were identified in D74 and fourteen methylation sequence motifs were identified in Nagasaki using SMRT sequencing analysis. Only one of the methylation sequence motifs was observed in both strains indicative of the diversity between D74 and Nagasaki. Subsequent analysis also revealed diversity in the restriction-modification systems harbored by D74 and Nagasaki. The collective information reported in this study will aid in the development of vaccines and intervention strategies to decrease the prevalence and disease burden caused by H. parasuis.

September 22, 2019

Correcting palindromes in long reads after whole-genome amplification.

Next-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long sequencing reads.Here, we present Pacasus, a tool for correcting such errors. Two datasets show that it markedly improves read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA.With Pacasus long-read technologies become available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.

September 22, 2019

Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Recent developments in third-gen long read sequencing and diploid-aware assemblers have resulted in the rapid release of numerous reference-quality assemblies for diploid genomes. However, assembly of highly heterozygous genomes is still problematic when regional heterogeneity is so high that haplotype homology is not recognised during assembly. This results in regional duplication rather than consolidation into allelic variants and can cause issues with downstream analysis, for example variant discovery, or haplotype reconstruction using the diploid assembly with unpaired allelic contigs.A new pipeline-Purge Haplotigs-was developed specifically for third-gen sequencing-based assemblies to automate the reassignment of allelic contigs, and to assist in the manual curation of genome assemblies. The pipeline uses a draft haplotype-fused assembly or a diploid assembly, read alignments, and repeat annotations to identify allelic variants in the primary assembly. The pipeline was tested on a simulated dataset and on four recent diploid (phased) de novo assemblies from third-generation long-read sequencing, and compared with a similar tool. After processing with Purge Haplotigs, haploid assemblies were less duplicated with minimal impact on genome completeness, and diploid assemblies had more pairings of allelic contigs.Purge Haplotigs improves the haploid and diploid representations of third-gen sequencing based genome assemblies by identifying and reassigning allelic contigs. The implementation is fast and scales well with large genomes, and it is less likely to over-purge repetitive or paralogous elements compared to alignment-only based methods. The software is available at https://bitbucket.org/mroachawri/purge_haplotigs under a permissive MIT licence.

September 22, 2019

Antibiotic-resistant indicator bacteria in irrigation water: High prevalence of extended-spectrum beta-lactamase (ESBL)-producing Escherichia coli.

Irrigation water is a major source of fresh produce contamination with undesired microorganisms including antibiotic-resistant bacteria (ARB), and contaminated fresh produce can transfer ARB to the consumer especially when consumed raw. Nevertheless, no legal guidelines exist so far regulating quality of irrigation water with respect to ARB. We therefore examined irrigation water from major vegetable growing areas for occurrence of antibiotic-resistant indicator bacteria Escherichia coli and Enterococcus spp., including extended-spectrum ß-lactamase (ESBL)-producing E. coli and vancomycin-resistant Enterococcus spp. Occurrence of ARB strains was compared to total numbers of the respective species. We categorized water samples according to total numbers and found that categories with higher total E. coli or Enterococcus spp. numbers generally had an increased proportion of respective ARB-positive samples. We further detected high prevalence of ESBL-producing E. coli with eight positive samples of thirty-six (22%), while two presumptive vancomycin-resistant Enterococcus spp. were vancomycin-susceptible in confirmatory tests. In disk diffusion assays all ESBL-producing E. coli were multidrug-resistant (n = 21) and whole-genome sequencing of selected strains revealed a multitude of transmissible resistance genes (ARG), with blaCTX-M-1 (4 of 11) and blaCTX-M-15 (3 of 11) as the most frequent ESBL genes. Overall, the increased occurrence of indicator ARB with increased total indicator bacteria suggests that the latter might be a suitable estimate for presence of respective ARB strains. Finally, the high prevalence of ESBL-producing E. coli with transmissible ARG emphasizes the need to establish legal critical values and monitoring guidelines for ARB in irrigation water.

Auto Tag: Long-read assembly

The opium poppy genome and morphinan production.

Assembling the genome of the African wild rice Oryza longistaminata by exploiting synteny in closely related Oryza species.

The genomic basis of color pattern polymorphism in the Harlequin ladybird.

Characterization of the lytic bacteriophage phiEaP-8 effective against both Erwinia amylovora and Erwinia pyrifoliae causing severe diseases in apple and pear.

Repeat elements organise 3D genome structure and mediate transcription in the filamentous fungus Epichloë festucae.

Complete genome sequence and characterization of linezolid-resistant Enterococcus faecalis clinical isolate KUB3006 carrying a cfr(B)-transposon on its chromosome and optrA-plasmid.

A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content

A continuous genome assembly of the corkwing wrasse (Symphodus melops).

The unique evolution of the pig LRC, a single KIR but expansion of LILR and a novel Ig receptor family.

Computational tools to unmask transposable elements.

The genomic architecture and molecular evolution of ant odorant receptors.

Comparative genomic and methylome analysis of non-virulent D74 and virulent Nagasaki Haemophilus parasuis isolates.

Correcting palindromes in long reads after whole-genome amplification.

Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Antibiotic-resistant indicator bacteria in irrigation water: High prevalence of extended-spectrum beta-lactamase (ESBL)-producing Escherichia coli.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert