Development of high-throughput sequencing techniques have greatly benefited our understanding about microbial ecology; yet the methods producing short reads suffer from species-level resolution and uncertainty of identification. Here we optimize PacBio-based metabarcoding protocols covering the Internal Transcribed Spacer (ITS region) and partial Small Subunit (SSU) of the rRNA gene for species-level identification of all eukaryotes, with a specific focus on Fungi (including Glomeromycota) and Stramenopila (particularly Oomycota). Based on tests on composite soil samples and mock communities, we propose best suitable degenerate primers, ITS9munngs + ITS4ngsUni for eukaryotes and selected groups therein and discuss pros and cons of long read-based identification of eukaryotes. This article is protected by copyright. All rights reserved.
Chromulinavorax destructans, a pathogen of microzooplankton that provides a window into the enigmatic candidate phylum Dependentiae.
Members of the major candidate phylum Dependentiae (a.k.a. TM6) are widespread across diverse environments from showerheads to peat bogs; yet, with the exception of two isolates infecting amoebae, they are only known from metagenomic data. The limited knowledge of their biology indicates that they have a long evolutionary history of parasitism. Here, we present Chromulinavorax destructans (Strain SeV1) the first isolate of this phylum to infect a representative from a widespread and ecologically significant group of heterotrophic flagellates, the microzooplankter Spumella elongata (Strain CCAP 955/1). Chromulinavorax destructans has a reduced 1.2 Mb genome that is so specialized for infection that it shows no evidence of complete metabolic pathways, but encodes an extensive transporter system for importing nutrients and energy in the form of ATP from the host. Its replication causes extensive reorganization and expansion of the mitochondrion, effectively surrounding the pathogen, consistent with its dependency on the host for energy. Nearly half (44%) of the inferred proteins contain signal sequences for secretion, including many without recognizable similarity to proteins of known function, as well as 98 copies of proteins with an ankyrin-repeat domain; ankyrin-repeats are known effectors of host modulation, suggesting the presence of an extensive host-manipulation apparatus. These observations help to cement members of this phylum as widespread and diverse parasites infecting a broad range of eukaryotic microbes.
Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.
A New Species of the ?-Proteobacterium Francisella, F. adeliensis Sp. Nov., Endocytobiont in an Antarctic Marine Ciliate and Potential Evolutionary Forerunner of Pathogenic Species.
The study of the draft genome of an Antarctic marine ciliate, Euplotes petzi, revealed foreign sequences of bacterial origin belonging to the ?-proteobacterium Francisella that includes pathogenic and environmental species. TEM and FISH analyses confirmed the presence of a Francisella endocytobiont in E. petzi. This endocytobiont was isolated and found to be a new species, named F. adeliensis sp. nov.. F. adeliensis grows well at wide ranges of temperature, salinity, and carbon dioxide concentrations implying that it may colonize new organisms living in deeply diversified habitats. The F. adeliensis genome includes the igl and pdp gene sets (pdpC and pdpE excepted) of the Francisella pathogenicity island needed for intracellular growth. Consistently with an F. adeliensis ancient symbiotic lifestyle, it also contains a single insertion-sequence element. Instead, it lacks genes for the biosynthesis of essential amino acids such as cysteine, lysine, methionine, and tyrosine. In a genome-based phylogenetic tree, F. adeliensis forms a new early branching clade, basal to the evolution of pathogenic species. The correlations of this clade with the other clades raise doubts about a genuine free-living nature of the environmental Francisella species isolated from natural and man-made environments, and suggest to look at F. adeliensis as a pioneer in the Francisella colonization of eukaryotic organisms.
Nephromyces encodes a urate metabolism pathway and predicted peroxisomes, demonstrating that these are not ancient losses of apicomplexans.
The phylum Apicomplexa is a quintessentially parasitic lineage, whose members infect a broad range of animals. One exception to this may be the apicomplexan genus Nephromyces, which has been described as having a mutualistic relationship with its host. Here we analyze transcriptome data from Nephromyces and its parasitic sister taxon, Cardiosporidium, revealing an ancestral purine degradation pathway thought to have been lost early in apicomplexan evolution. The predicted localization of many of the purine degradation enzymes to peroxisomes, and the in silico identification of a full set of peroxisome proteins, indicates that loss of both features in other apicomplexans occurred multiple times. The degradation of purines is thought to play a key role in the unusual relationship between Nephromyces and its host. Transcriptome data confirm previous biochemical results of a functional pathway for the utilization of uric acid as a primary nitrogen source for this unusual apicomplexan.
The single-celled ciliate Paramecium bursaria is an indispensable model for investigating endosymbiosis between protists and green-algal symbionts. To elucidate the mechanism of this type of endosymbiosis, we combined PacBio and Illumina sequencing to assemble a high-quality and near-complete macronuclear genome of P. bursaria. The genomic characteristics and phylogenetic analyses indicate that P. bursaria is the basal clade of the Paramecium genus. Through comparative genomic analyses with its close relatives, we found that P. bursaria encodes more genes related to nitrogen metabolism and mineral absorption, but encodes fewer genes involved in oxygen binding and N-glycan biosynthesis. A comparison of the transcriptomic profiles between P. bursaria with and without endosymbiotic Chlorella showed differential expression of a wide range of metabolic genes. We selected 32 most differentially expressed genes to perform RNA interference experiment in P. bursaria, and found that P. bursaria can regulate the abundance of their symbionts through glutamine supply. This study provides novel insights into Paramecium evolution and will extend our knowledge of the molecular mechanism for the induction of endosymbiosis between P. bursaria and green algae.
Genetic exchange enables parasites to rapidly transform disease phenotypes and exploit new host populations. Trypanosoma cruzi, the parasitic agent of Chagas disease and a public health concern throughout Latin America, has for decades been presumed to exchange genetic material rarely and without classic meiotic sex. We present compelling evidence from 45 genomes sequenced from southern Ecuador that T. cruzi in fact maintains truly sexual, panmictic groups that can occur alongside others that remain highly clonal after past hybridization events. These groups with divergent reproductive strategies appear genetically isolated despite possible co-occurrence in vectors and hosts. We propose biological explanations for the fine-scale disconnectivity we observe and discuss the epidemiological consequences of flexible reproductive modes. Our study reinvigorates the hunt for the site of genetic exchange in the T. cruzi life cycle, provides tools to define the genetic determinants of parasite virulence, and reforms longstanding theory on clonality in trypanosomatid parasites.
Genome assembly of Nannochloropsis oceanica provides evidence of host nucleus overthrow by the symbiont nucleus during speciation.
The species of the genus Nannochloropsis are unique in their maintenance of a nucleus-plastid continuum throughout their cell cycle, non-motility and asexual reproduction. These characteristics should have been endorsed in their gene assemblages (genomes). Here we show that N. oceanica has a genome of 29.3?Mb consisting of 32 pseudochromosomes and containing 7,330 protein-coding genes; and the host nucleus may have been overthrown by an ancient red alga symbiont nucleus during speciation through secondary endosymbiosis. In addition, N. oceanica has lost its flagella and abilities to undergo meiosis and sexual reproduction, and adopted a genome reduction strategy during speciation. We propose that N. oceanica emerged through the active fusion of a host protist and a photosynthesizing ancient red alga and the symbiont nucleus became dominant over the host nucleus while the chloroplast was wrapped by two layers of endoplasmic reticulum. Our findings evidenced an alternative speciation pathway of eukaryotes.
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation.
We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
Second-generation, high-throughput sequencing methods have greatly improved our understanding of the ecology of soil microorganisms, yet the short barcodes (< 500 bp) provide limited taxonomic and phylogenetic information for species discrimination and taxonomic assignment. Here, we utilized the third-generation Pacific Biosciences (PacBio) RSII and Sequel instruments to evaluate the suitability of full-length internal transcribed spacer (ITS) barcodes and longer rRNA gene amplicons for metabarcoding Fungi, Oomycetes and other eukaryotes in soil samples. Metabarcoding revealed multiple errors and biases: Taq polymerase substitution errors and mis-incorporating indels in sequencing homopolymers constitute major errors; sequence length biases occur during PCR, library preparation, loading to the sequencing instrument and quality filtering; primer-template mismatches bias the taxonomic profile when using regular and highly degenerate primers. The RSII and Sequel platforms enable the sequencing of amplicons up to 3000 bp, but the sequence quality remains slightly inferior to Illumina sequencing especially in longer amplicons. The full ITS barcode and flanking rRNA small subunit gene greatly improve taxonomic identification at the species and phylum levels, respectively. We conclude that PacBio sequencing provides a viable alternative for metabarcoding of organisms that are of relatively low diversity, require > 500-bp barcode for reliable identification or when phylogenetic approaches are intended.© 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Comparison of the mitochondrial genomes and steady state transcriptomes of two strains of the trypanosomatid parasite, Leishmania tarentolae.
U-insertion/deletion RNA editing is a post-transcriptional mitochondrial RNA modification phenomenon required for viability of trypanosomatid parasites. Small guide RNAs encoded mainly by the thousands of catenated minicircles contain the information for this editing. We analyzed by NGS technology the mitochondrial genomes and transcriptomes of two strains, the old lab UC strain and the recently isolated LEM125 strain. PacBio sequencing provided complete minicircle sequences which avoided the assembly problem of short reads caused by the conserved regions. Minicircles were identified by a characteristic size, the presence of three short conserved sequences, a region of inherently bent DNA and the presence of single gRNA genes at a fairly defined location. The LEM125 strain contained over 114 minicircles encoding different gRNAs and the UC strain only ~24 minicircles. Some LEM125 minicircles contained no identifiable gRNAs. Approximate copy numbers of the different minicircle classes in the network were determined by the number of PacBio CCS reads that assembled to each class. Mitochondrial RNA libraries from both strains were mapped against the minicircle and maxicircle sequences. Small RNA reads mapped to the putative gRNA genes but also to multiple regions outside the genes on both strands and large RNA reads mapped in many cases over almost the entire minicircle on both strands. These data suggest that minicircle transcription is complete and bidirectional, with 3′ processing yielding the mature gRNAs. Steady state RNAs in varying abundances are derived from all maxicircle genes, including portions of the repetitive divergent region. The relative extents of editing in both strains correlated with the presence of a cascade of cognate gRNAs. These data should provide the foundation for a deeper understanding of this dynamic genetic system as well as the evolutionary variation of editing in different strains.
Molecular characterization of eukaryotic algal communities in the tropical phyllosphere based on real-time sequencing of the 18S rDNA gene.
Foliicolous algae are a common occurrence in tropical forests. They are referable to a few simple morphotypes (unicellular, sarcinoid-like or filamentous), which makes their morphology of limited usefulness for taxonomic studies and species diversity assessments. The relationship between algal community and their host phyllosphere was not clear. In order to obtain a more accurate assessment, we used single molecule real-time sequencing of the 18S rDNA gene to characterize the eukaryotic algal community in an area of South-western China.We annotated 2922 OTUs belonging to five classes, Ulvophyceae, Trebouxiophyceae, Chlorophyceae, Dinophyceae and Eustigmatophyceae. Novel clades formed by large numbers sequences of green algae were detected in the order Trentepohliales (Ulvophyceae) and the Watanabea clade (Trebouxiophyceae), suggesting that these foliicolous communities may be substantially more diverse than so far appreciated and require further research. Species in Trentepohliales, Watanabea clade and Apatococcus clade were detected as the core members in the phyllosphere community studied. Communities from different host trees and sampling sites were not significantly different in terms of OTUs composition. However, the communities of Musa and Ravenala differed from other host plants significantly at the genus level, since they were dominated by Trebouxiophycean epiphytes.The cryptic diversity of eukaryotic algae especially Chlorophytes in tropical phyllosphere is very high. The community structure at species-level has no significant relationship either with host phyllosphere or locations. The core algal community in tropical phyllopshere is consisted of members from Trentepohliales, Watanabea clade and Apatococcus clade. Our study provided a large amount of novel 18S rDNA sequences that will be useful to unravel the cryptic diversity of phyllosphere eukaryotic algae and for comparisons with similar future studies on this type of communities.
Molecular genetic diversity and characterization of conjugation genes in the fish parasite Ichthyophthirius multifiliis.
Ichthyophthirius multifiliis is the etiologic agent of “white spot”, a commercially important disease of freshwater fish. As a parasitic ciliate, I. multifiliis infects numerous host species across a broad geographic range. Although Ichthyophthirius outbreaks are difficult to control, recent sequencing of the I. multifiliis genome has revealed a number of potential metabolic pathways for therapeutic intervention, along with likely vaccine targets for disease prevention. Nonetheless, major gaps exist in our understanding of both the life cycle and population structure of I. multifiliis in the wild. For example, conjugation has never been described in this species, and it is unclear whether I. multifiliis undergoes sexual reproduction, despite the presence of a germline micronucleus. In addition, no good methods exist to distinguish strains, leaving phylogenetic relationships between geographic isolates completely unresolved. Here, we compared nucleotide sequences of SSUrDNA, mitochondrial NADH dehydrogenase subunit I and cox-1 genes, and 14 somatic SNP sites from nine I. multifiliis isolates obtained from four different states in the US since 1995. The mitochondrial sequences effectively distinguished the isolates from one another and divided them into at least two genetically distinct groups. Furthermore, none of the nine isolates shared the same composition of the 14 somatic SNP sites, suggesting that I. multifiliis undergoes sexual reproduction at some point in its life cycle. Finally, compared to the well-studied free-living ciliates Tetrahymena thermophila and Paramecium tetraurelia, I. multifiliis has lost 38% and 29%, respectively, of 16 experimentally confirmed conjugation-related genes, indicating that mechanistic differences in sexual reproduction are likely to exist between I. multifiliis and other ciliate species. Copyright © 2015 Elsevier Inc. All rights reserved.
A comprehensive fungi-specific 18S rRNA gene sequence primer toolkit suited for diverse research issues and sequencing platforms.
Several fungi-specific primers target the 18S rRNA gene sequence, one of the prominent markers for fungal classification. The design of most primers goes back to the last decades. Since then, the number of sequences in public databases increased leading to the discovery of new fungal groups and changes in fungal taxonomy. However, no reevaluation of primers was carried out and relevant information on most primers is missing. With this study, we aimed to develop an 18S rRNA gene sequence primer toolkit allowing an easy selection of the best primer pair appropriate for different sequencing platforms, research aims (biodiversity assessment versus isolate classification) and target groups.We performed an intensive literature research, reshuffled existing primers into new pairs, designed new Illumina-primers, and annealing blocking oligonucleotides. A final number of 439 primer pairs were subjected to in silico PCRs. Best primer pairs were selected and experimentally tested. The most promising primer pair with a small amplicon size, nu-SSU-1333-5’/nu-SSU-1647-3′ (FF390/FR-1), was successful in describing fungal communities by Illumina sequencing. Results were confirmed by a simultaneous metagenomics and eukaryote-specific primer approach. Co-amplification occurred in all sample types but was effectively reduced by blocking oligonucleotides.The compiled data revealed the presence of an enormous diversity of fungal 18S rRNA gene primer pairs in terms of fungal coverage, phylum spectrum and co-amplification. Therefore, the primer pair has to be carefully selected to fulfill the requirements of the individual research projects. The presented primer toolkit offers comprehensive lists of 164 primers, 439 primer combinations, 4 blocking oligonucleotides, and top primer pairs holding all relevant information including primer’s characteristics and performance to facilitate primer pair selection.
Advances in sequencing technologies continue to provide unprecedented opportunities to characterize microbial communities. For example, the Pacific Biosciences Single Molecule Real-Time (SMRT) platform has emerged as a unique approach harnessing DNA polymerase activity to sequence template molecules, enabling long reads at low costs. With the aim to simultaneously classify and enumerate in situ microbial populations, we developed a quantitative SMRT (qSMRT) approach that involves the addition of exogenous standards to quantify ribosomal amplicons derived from environmental samples. The V7-9 regions of 18S SSU rDNA were targeted and quantified from protistan community samples collected in the Ross Sea during the Austral summer of 2011. We used three standards of different length and optimized conditions to obtain accurate quantitative retrieval across the range of expected amplicon sizes, a necessary criterion for analyzing taxonomically diverse 18S rDNA molecules from natural environments. The ability to concurrently identify and quantify microorganisms in their natural environment makes qSMRT a powerful, rapid and cost-effective approach for defining ecosystem diversity and function. Copyright © 2017 Elsevier B.V. All rights reserved.