HiFi sequencing on the PacBio Sequel II System enables complete microbial community profiling of complex metagenomic samples using whole genome shotgun sequences. With HiFi sequencing, highly accurate long reads overcome the challenges posed by the presence of intergenic and extragenic repeat elements in microbial genomes, thus greatly improving phylogenetic profiling and sequence assembly. Recent improvements in library construction protocols enable HiFi sequencing starting from as low as 5 ng of input DNA. Here, we demonstrate comparative analyses of a control sample of known composition and a human fecal sample from varying amounts of input genomic DNA (1 ug, 200 ng, 5 ng), and present the corresponding library preparation workflows for standard, low input, and Ultra-Low methods. We demonstrate that the metagenome assembly, taxonomic assignment, and gene finding analyses are comparable across all methods for both samples, providing access to HiFi sequencing even for DNA-limited sample types.
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated bromelain inhibitors. Four candidate genes for self-incompatibility were linked in F153, but were not functional in self-compatible CB5. Our findings support the coexistence of sexual recombination and a one-step operation in the domestication of clonally propagated crops. This work guides the exploration of sexual and asexual domestication trajectories in other clonally propagated crops.
Development of high-throughput sequencing techniques have greatly benefited our understanding about microbial ecology; yet the methods producing short reads suffer from species-level resolution and uncertainty of identification. Here we optimize PacBio-based metabarcoding protocols covering the Internal Transcribed Spacer (ITS region) and partial Small Subunit (SSU) of the rRNA gene for species-level identification of all eukaryotes, with a specific focus on Fungi (including Glomeromycota) and Stramenopila (particularly Oomycota). Based on tests on composite soil samples and mock communities, we propose best suitable degenerate primers, ITS9munngs + ITS4ngsUni for eukaryotes and selected groups therein and discuss pros and cons of long read-based identification of eukaryotes. This article is protected by copyright. All rights reserved.
Defining transgene insertion sites and off-target effects of homology-based gene silencing informs the use of functional genomics tools in Phytophthora infestans.
DNA transformation and homology-based transcriptional silencing are frequently used to assess gene function in Phytophthora. Since unplanned side-effects of these tools are not well-characterized, we used P. infestans to study plasmid integration sites and whether knockdowns caused by homology-dependent silencing spreads to other genes. Insertions occurred both in gene-dense and gene-sparse regions but disproportionately near the 5′ ends of genes, which disrupted native coding sequences. Microhomology at the recombination site between plasmid and chromosome was common. Studies of transformants silenced for twelve different gene targets indicated that neighbors within 500-nt were often co-silenced, regardless of whether hairpin or sense constructs were employed and the direction of transcription of the target. However, cis-spreading of silencing did not occur in all transformants obtained with the same plasmid. Genome-wide studies indicated that unlinked genes with partial complementarity with the silencing-inducing transgene were not usually down-regulated. We learned that hairpin or sense transgenes were not co-silenced with the target in all transformants, which informs how screens for silencing should be performed. We conclude that transformation and gene silencing can be reliable tools for functional genomics in Phytophthora but must be used carefully, especially by testing for the spread of silencing to genes flanking the target.
Motivation: Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results: In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.
The ADEP Biosynthetic Gene Cluster in Streptomyces hawaiiensis NRRL 15010 Reveals an Accessory clpP Gene as a Novel Antibiotic Resistance Factor.
The increasing threat posed by multiresistant bacterial pathogens necessitates the discovery of novel antibacterials with unprecedented modes of action. ADEP1, a natural compound produced by Streptomyces hawaiiensis NRRL 15010, is the prototype for a new class of acyldepsipeptide (ADEP) antibiotics. ADEP antibiotics deregulate the proteolytic core ClpP of the bacterial caseinolytic protease, thereby exhibiting potent antibacterial activity against Gram-positive bacteria, including multiresistant pathogens. ADEP1 and derivatives, here collectively called ADEP, have been previously investigated for their antibiotic potency against different species, structure-activity relationship, and mechanism of action; however, knowledge on the biosynthesis of the natural compound and producer self-resistance have remained elusive. In this study, we identified and analyzed the ADEP biosynthetic gene cluster in S. hawaiiensis NRRL 15010, which comprises two NRPSs, genes necessary for the biosynthesis of (4S,2R)-4-methylproline, and a type II polyketide synthase (PKS) for the assembly of highly reduced polyenes. While no resistance factor could be identified within the gene cluster itself, we discovered an additional clpP homologous gene (named clpPADEP) located further downstream of the biosynthetic genes, separated from the biosynthetic gene cluster by several transposable elements. Heterologous expression of ClpPADEP in three ADEP-sensitive Streptomyces species proved its role in conferring ADEP resistance, thereby revealing a novel type of antibiotic resistance determinant.IMPORTANCE Antibiotic acyldepsipeptides (ADEPs) represent a promising new class of potent antibiotics and, at the same time, are valuable tools to study the molecular functioning of their target, ClpP, the proteolytic core of the bacterial caseinolytic protease. Here, we present a straightforward purification procedure for ADEP1 that yields substantial amounts of the pure compound in a time- and cost-efficient manner, which is a prerequisite to conveniently study the antimicrobial effects of ADEP and the operating mode of bacterial ClpP machineries in diverse bacteria. Identification and characterization of the ADEP biosynthetic gene cluster in Streptomyces hawaiiensis NRRL 15010 enables future bioinformatics screenings for similar gene clusters and/or subclusters to find novel natural compounds with specific substructures. Most strikingly, we identified a cluster-associated clpP homolog (named clpPADEP) as an ADEP resistance gene. ClpPADEP constitutes a novel bacterial resistance factor that alone is necessary and sufficient to confer high-level ADEP resistance to Streptomyces across species.Copyright © 2019 American Society for Microbiology.
Whole-Genome Sequence of an Isogenic Haploid Strain, Saccharomyces cerevisiae IR-2idA30(MATa), Established from the Industrial Diploid Strain IR-2.
We present the draft genome sequence of an isogenic haploid strain, IR-2idA30(MATa), established from Saccharomyces cerevisiae IR-2. Assembly of long reads and previously obtained contigs from the genome of diploid IR-2 resulted in 50 contigs, and the variations and sequencing errors were corrected by short reads. Copyright © 2019 Fujimori et al.
Whole Genome Sequencing and Analysis of Chlorimuron-Ethyl Degrading Bacteria Klebsiella pneumoniae 2N3.
Klebsiella pneumoniae 2N3 is a strain of gram-negative bacteria that can degrade chlorimuron-ethyl and grow with chlorimuron-ethyl as the sole nitrogen source. The complete genome of Klebsiella pneumoniae 2N3 was sequenced using third generation high-throughput DNA sequencing technology. The genomic size of strain 2N3 was 5.32 Mb with a GC content of 57.33% and a total of 5156 coding genes and 112 non-coding RNAs predicted. Two hydrolases expressed by open reading frames (ORFs) 0934 and 0492 were predicted and experimentally confirmed by gene knockout to be involved in the degradation of chlorimuron-ethyl. Strains of ?ORF 0934, ?ORF 0492, and wild type (WT) reached their highest growth rates after 8-10 hours in incubation. The degradation rates of chlorimuron-ethyl by both ?ORF 0934 and ?ORF 0492 decreased in comparison to the WT during the first 8 hours in culture by 25.60% and 24.74%, respectively, while strains ?ORF 0934, ?ORF 0492, and the WT reached the highest degradation rates of chlorimuron-ethyl in 36 hours of 74.56%, 90.53%, and 95.06%, respectively. This study provides scientific evidence to support the application of Klebsiella pneumoniae 2N3 in bioremediation to control environmental pollution.
Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes.We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome.LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Intercellular communication is required for trap formation in the nematode-trapping fungus Duddingtonia flagrans.
Nematode-trapping fungi (NTF) are a large and diverse group of fungi, which may switch from a saprotrophic to a predatory lifestyle if nematodes are present. Different fungi have developed different trapping devices, ranging from adhesive cells to constricting rings. After trapping, fungal hyphae penetrate the worm, secrete lytic enzymes and form a hyphal network inside the body. We sequenced the genome of Duddingtonia flagrans, a biotechnologically important NTF used to control nematode populations in fields. The 36.64 Mb genome encodes 9,927 putative proteins, among which are more than 638 predicted secreted proteins. Most secreted proteins are lytic enzymes, but more than 200 were classified as small secreted proteins (< 300 amino acids). 117 putative effector proteins were predicted, suggesting interkingdom communication during the colonization. As a first step to analyze the function of such proteins or other phenomena at the molecular level, we developed a transformation system, established the fluorescent proteins GFP and mCherry, adapted an assay to monitor protein secretion, and established gene-deletion protocols using homologous recombination or CRISPR/Cas9. One putative virulence effector protein, PefB, was transcriptionally induced during the interaction. We show that the mature protein is able to be imported into nuclei in Caenorhabditis elegans cells. In addition, we studied trap formation and show that cell-to-cell communication is required for ring closure. The availability of the genome sequence and the establishment of many molecular tools will open new avenues to studying this biotechnologically relevant nematode-trapping fungus.
Pathogenic yeasts and fungi are an increasing global healthcare burden, but discovery of novel antifungal agents is slow. The mycoparasitic yeast Saccharomycopsis schoenii was recently demonstrated to be able to kill the emerging multi-drug resistant yeast pathogen Candida auris. However, the molecular mechanisms involved in the predatory activity of S. schoenii have not been explored. To this end, we de novo sequenced, assembled and annotated a draft genome of S. schoenii. Using proteomics, we confirmed that Saccharomycopsis yeasts have reassigned the CTG codon and translate CTG into serine instead of leucine. Further, we confirmed an absence of all genes from the sulfate assimilation pathway in the genome of S. schoenii, and detected the expansion of several gene families, including aspartic proteases. Using Saccharomyces cerevisiae as a model prey cell, we honed in on the timing and nutritional conditions under which S. schoenii kills prey cells. We found that a general nutrition limitation, not a specific methionine deficiency, triggered predatory activity. Nevertheless, by means of genome-wide transcriptome analysis we observed dramatic responses to methionine deprivation, which were alleviated when S. cerevisiae was available as prey, and therefore postulate that S. schoenii acquired methionine from its prey cells. During predation, both proteomic and transcriptomic analyses revealed that S. schoenii highly upregulated and translated aspartic protease genes, probably used to break down prey cell walls. With these fundamental insights into the predatory behavior of S. schoenii, we open up for further exploitation of this yeast as a biocontrol yeast and/or source for novel antifungal agents.
Finding the needle in a haystack: Mapping antifungal drug resistance in fungal pathogen by genomic approaches.
Fungi are ubiquitous on earth and are essential for the maintenance of the global ecological equilibrium. Despite providing benefits to living organisms, they can also target specific hosts and inflict damage. These fungal pathogens are known to affect, for example, plants and mam- mals and thus reduce crop production necessary to sustain food supply and cause mortality in humans and animals. Designing defenses against these fungi is essential for the control of food resources and human health. As far as fungal pathogens are concerned, the principal option has been the use of antifungal agents, also called fungicides when they are used in the environment.
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes.
Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94?Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5?kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant-pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The Reference Genome Sequence of Scutellaria baicalensis Provides Insights into the Evolution of Wogonin Biosynthesis.
Scutellaria baicalensis Georgi is important in Chinese traditional medicine where preparations of dried roots, “Huang Qin,” are used for liver and lung complaints and as complementary cancer treatments. We report a high-quality reference genome sequence for S. baicalensis where 93% of the 408.14-Mb genome has been assembled into nine pseudochromosomes with a super-N50 of 33.2 Mb. Comparison of this sequence with those of closely related species in the order Lamiales, Sesamum indicum and Salvia splendens, revealed that a specialized metabolic pathway for the synthesis of 4′-deoxyflavone bioactives evolved in the genus Scutellaria. We found that the gene encoding a specific cinnamate coenzyme A ligase likely obtained its new function following recent mutations, and that four genes encoding enzymes in the 4′-deoxyflavone pathway are present as tandem repeats in the genome of S. baicalensis. Further analyses revealed that gene duplications, segmental duplication, gene amplification, and point mutations coupled to gene neo- and subfunctionalizations were involved in the evolution of 4′-deoxyflavone synthesis in the genus Scutellaria. Our study not only provides significant insight into the evolution of specific flavone biosynthetic pathways in the mint family, Lamiaceae, but also will facilitate the development of tools for enhancing bioactive productivity by metabolic engineering in microbes or by molecular breeding in plants. The reference genome of S. baicalensis is also useful for improving the genome assemblies for other members of the mint family and offers an important foundation for decoding the synthetic pathways of bioactive compounds in medicinal plants.Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.