Menu
September 22, 2019

PacBio metabarcoding of Fungi and other eukaryotes: errors, biases and perspectives.

Second-generation, high-throughput sequencing methods have greatly improved our understanding of the ecology of soil microorganisms, yet the short barcodes (< 500 bp) provide limited taxonomic and phylogenetic information for species discrimination and taxonomic assignment. Here, we utilized the third-generation Pacific Biosciences (PacBio) RSII and Sequel instruments to evaluate the suitability of full-length internal transcribed spacer (ITS) barcodes and longer rRNA gene amplicons for metabarcoding Fungi, Oomycetes and other eukaryotes in soil samples. Metabarcoding revealed multiple errors and biases: Taq polymerase substitution errors and mis-incorporating indels in sequencing homopolymers constitute major errors; sequence length biases occur during PCR, library preparation, loading to the sequencing instrument and quality filtering; primer-template mismatches bias the taxonomic profile when using regular and highly degenerate primers. The RSII and Sequel platforms enable the sequencing of amplicons up to 3000 bp, but the sequence quality remains slightly inferior to Illumina sequencing especially in longer amplicons. The full ITS barcode and flanking rRNA small subunit gene greatly improve taxonomic identification at the species and phylum levels, respectively. We conclude that PacBio sequencing provides a viable alternative for metabarcoding of organisms that are of relatively low diversity, require > 500-bp barcode for reliable identification or when phylogenetic approaches are intended.© 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.


September 22, 2019

An environmental bacterial taxon with a large and distinct metabolic repertoire.

Cultivated bacteria such as actinomycetes are a highly useful source of biomedically important natural products. However, such ‘talented’ producers represent only a minute fraction of the entire, mostly uncultivated, prokaryotic diversity. The uncultured majority is generally perceived as a large, untapped resource of new drug candidates, but so far it is unknown whether taxa containing talented bacteria indeed exist. Here we report the single-cell- and metagenomics-based discovery of such producers. Two phylotypes of the candidate genus ‘Entotheonella’ with genomes of greater than 9 megabases and multiple, distinct biosynthetic gene clusters co-inhabit the chemically and microbially rich marine sponge Theonella swinhoei. Almost all bioactive polyketides and peptides known from this animal were attributed to a single phylotype. ‘Entotheonella’ spp. are widely distributed in sponges and belong to an environmental taxon proposed here as candidate phylum ‘Tectomicrobia’. The pronounced bioactivities and chemical uniqueness of ‘Entotheonella’ compounds provide significant opportunities for ecological studies and drug discovery.


September 22, 2019

The genome of an underwater architect, the caddisfly Stenopsyche tienmushanensis Hwang (Insecta: Trichoptera).

Caddisflies (Insecta: Trichoptera) are a highly adapted freshwater group of insects split from a common ancestor with Lepidoptera. They are the most diverse (>16,000 species) of the strictly aquatic insect orders and are widely employed as bio-indicators in water quality assessment and monitoring. Among the numerous adaptations to aquatic habitats, caddisfly larvae use silk and materials from the environment (e.g., stones, sticks, leaf matter) to build composite structures such as fixed retreats and portable cases. Understanding how caddisflies have adapted to aquatic habitats will help explain the evolution and subsequent diversification of the group.We sequenced a retreat-builder caddisfly Stenopsyche tienmushanensis Hwang and assembled a high-quality genome from both Illumina and Pacific Biosciences (PacBio) sequencing. In total, 601.2 M Illumina reads (90.2 Gb) and 16.9 M PacBio subreads (89.0 Gb) were generated. The 451.5 Mb assembled genome has a contig N50 of 1.29 M, has a longest contig of 4.76 Mb, and covers 97.65% of the 1,658 insect single-copy genes as assessed by Benchmarking Universal Single-Copy Orthologs. The genome comprises 36.76% repetitive elements. A total of 14,672 predicted protein-coding genes were identified. The genome revealed gene expansions in specific groups of the cytochrome P450 family and olfactory binding proteins, suggesting potential genomic features associated with pollutant tolerance and mate finding. In addition, the complete gene complex of the highly repetitive H-fibroin, the major protein component of caddisfly larval silk, was assembled.We report the draft genome of Stenopsyche tienmushanensis, the highest-quality caddisfly genome so far. The genome information will be an important resource for the study of caddisflies and may shed light on the evolution of aquatic insects.


September 22, 2019

Subaerial biofilms on granitic historic buildings: microbial diversity and development of phototrophic multi-species cultures.

Microbial communities of natural subaerial biofilms developed on granitic historic buildings of a World Heritage Site (Santiago de Compostela, NW Spain) were characterized and cultured in liquid BG11 medium. Environmental barcoding through next-generation sequencing (Pacific Biosciences) revealed that the biofilms were mainly composed of species of Chlorophyta (green algae) and Ascomycota (fungi) commonly associated with rock substrata. Richness and diversity were higher for the fungal than for the algal assemblages and fungi showed higher heterogeneity among samples. Cultures derived from natural biofilms showed the establishment of stable microbial communities mainly composed of Chlorophyta and Cyanobacteria. Although most taxa found in these cultures were not common in the original biofilms, they are likely common pioneer colonizers of building stone surfaces, including granite. Stable phototrophic multi-species cultures of known microbial diversity were thus obtained and their reliability to emulate natural colonization on granite should be confirmed in further experiments.


September 22, 2019

Rapid infectious disease identification by next-generation DNA sequencing.

Currently, there is a critical need to rapidly identify infectious organisms in clinical samples. Next-Generation Sequencing (NGS) could surmount the deficiencies of culture-based methods; however, there are no standardized, automated programs to process NGS data. To address this deficiency, we developed the Rapid Infectious Disease Identification (RIDI™) system. The system requires minimal guidance, which reduces operator errors. The system is compatible with the three major NGS platforms. It automatically interfaces with the sequencing system, detects their data format, configures the analysis type, applies appropriate quality control, and analyzes the results. Sequence information is characterized using both the NCBI database and RIDI™ specific databases. RIDI™ was designed to identify high probability sequence matches and more divergent matches that could represent different or novel species. We challenged the system using defined American Type Culture Collection (ATCC) reference standards of 27 species, both individually and in varying combinations. The system was able to rapidly detect known organisms in <12h with multi-sample throughput. The system accurately identifies 99.5% of the DNA sequence reads at the genus-level and 75.3% at the species-level in reference standards. It has a limit of detection of 146cells/ml in simulated clinical samples, and is also able to identify the components of polymicrobial samples with 16.9% discrepancy at the genus-level and 31.2% at the species-level. Thus, the system's effectiveness may exceed current methods, especially in situations where culture methods could produce false negatives or where rapid results would influence patient outcomes. Copyright © 2016 Elsevier B.V. All rights reserved.


September 22, 2019

Is there foul play in the leaf pocket? The metagenome of floating fern Azolla reveals endophytes that do not fix N2 but may denitrify.

Dinitrogen fixation by Nostoc azollae residing in specialized leaf pockets supports prolific growth of the floating fern Azolla filiculoides. To evaluate contributions by further microorganisms, the A. filiculoides microbiome and nitrogen metabolism in bacteria persistently associated with Azolla ferns were characterized. A metagenomic approach was taken complemented by detection of N2 O released and nitrogen isotope determinations of fern biomass. Ribosomal RNA genes in sequenced DNA of natural ferns, their enriched leaf pockets and water filtrate from the surrounding ditch established that bacteria of A. filiculoides differed entirely from surrounding water and revealed species of the order Rhizobiales. Analyses of seven cultivated Azolla species confirmed persistent association with Rhizobiales. Two distinct nearly full-length Rhizobiales genomes were identified in leaf-pocket-enriched samples from ditch grown A. filiculoides. Their annotation revealed genes for denitrification but not N2 -fixation. 15 N2 incorporation was active in ferns with N. azollae but not in ferns without. N2 O was not detectably released from surface-sterilized ferns with the Rhizobiales. N2 -fixing N. azollae, we conclude, dominated the microbiome of Azolla ferns. The persistent but less abundant heterotrophic Rhizobiales bacteria possibly contributed to lowering O2 levels in leaf pockets but did not release detectable amounts of the strong greenhouse gas N2 O.© 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.


September 22, 2019

Moving beyond microbiome-wide associations to causal microbe identification.

Microbiome-wide association studies have established that numerous diseases are associated with changes in the microbiota. These studies typically generate a long list of commensals implicated as biomarkers of disease, with no clear relevance to disease pathogenesis. If the field is to move beyond correlations and begin to address causation, an effective system is needed for refining this catalogue of differentially abundant microbes and to allow subsequent mechanistic studies. Here we demonstrate that triangulation of microbe-phenotype relationships is an effective method for reducing the noise inherent in microbiota studies and enabling identification of causal microbes. We found that gnotobiotic mice harbouring different microbial communities exhibited differential survival in a colitis model. Co-housing of these mice generated animals that had hybrid microbiotas and displayed intermediate susceptibility to colitis. Mapping of microbe-phenotype relationships in parental mouse strains and in mice with hybrid microbiotas identified the bacterial family Lachnospiraceae as a correlate for protection from disease. Using directed microbial culture techniques, we discovered Clostridium immunis, a previously unknown bacterial species from this family, that-when administered to colitis-prone mice-protected them against colitis-associated death. To demonstrate the generalizability of our approach, we used it to identify several commensal organisms that induce intestinal expression of an antimicrobial peptide. Thus, we have used microbe-phenotype triangulation to move beyond the standard correlative microbiome study and identify causal microbes for two completely distinct phenotypes. Identification of disease-modulating commensals by microbe-phenotype triangulation may be more broadly applicable to human microbiome studies.


September 22, 2019

Comprehensive exploration of the rumen microbial ecosystem with advancements in metagenomics

Ruminant farming and its environmental impact has long remained an economic concern. Metagenomics unravel the vast structural and functional diversity of the rumen microbial community that plays a major role in animal nutrition. Hereby, we summarize rumen metagenomic studies that have enhanced the knowledge of rumen microbe dynamics subsequently leading to development of better feed strategies to improve livestock production and reduce methane emissions.


September 22, 2019

Molecular characterization of eukaryotic algal communities in the tropical phyllosphere based on real-time sequencing of the 18S rDNA gene.

Foliicolous algae are a common occurrence in tropical forests. They are referable to a few simple morphotypes (unicellular, sarcinoid-like or filamentous), which makes their morphology of limited usefulness for taxonomic studies and species diversity assessments. The relationship between algal community and their host phyllosphere was not clear. In order to obtain a more accurate assessment, we used single molecule real-time sequencing of the 18S rDNA gene to characterize the eukaryotic algal community in an area of South-western China.We annotated 2922 OTUs belonging to five classes, Ulvophyceae, Trebouxiophyceae, Chlorophyceae, Dinophyceae and Eustigmatophyceae. Novel clades formed by large numbers sequences of green algae were detected in the order Trentepohliales (Ulvophyceae) and the Watanabea clade (Trebouxiophyceae), suggesting that these foliicolous communities may be substantially more diverse than so far appreciated and require further research. Species in Trentepohliales, Watanabea clade and Apatococcus clade were detected as the core members in the phyllosphere community studied. Communities from different host trees and sampling sites were not significantly different in terms of OTUs composition. However, the communities of Musa and Ravenala differed from other host plants significantly at the genus level, since they were dominated by Trebouxiophycean epiphytes.The cryptic diversity of eukaryotic algae especially Chlorophytes in tropical phyllosphere is very high. The community structure at species-level has no significant relationship either with host phyllosphere or locations. The core algal community in tropical phyllopshere is consisted of members from Trentepohliales, Watanabea clade and Apatococcus clade. Our study provided a large amount of novel 18S rDNA sequences that will be useful to unravel the cryptic diversity of phyllosphere eukaryotic algae and for comparisons with similar future studies on this type of communities.


September 22, 2019

Exploring the genome and transcriptome of the cave nectar bat Eonycteris spelaea with PacBio long-read sequencing.

In the past two decades, bats have emerged as an important model system to study host-pathogen interactions. More recently, it has been shown that bats may also serve as a new and excellent model to study aging, inflammation, and cancer, among other important biological processes. The cave nectar bat or lesser dawn bat (Eonycteris spelaea) is known to be a reservoir for several viruses and intracellular bacteria. It is widely distributed throughout the tropics and subtropics from India to Southeast Asia and pollinates several plant species, including the culturally and economically important durian in the region. Here, we report the whole-genome and transcriptome sequencing, followed by subsequent de novo assembly, of the E. spelaea genome solely using the Pacific Biosciences (PacBio) long-read sequencing platform.The newly assembled E. spelaea genome is 1.97 Gb in length and consists of 4,470 sequences with a contig N50 of 8.0 Mb. Identified repeat elements covered 34.65% of the genome, and 20,640 unique protein-coding genes with 39,526 transcripts were annotated.We demonstrated that the PacBio long-read sequencing platform alone is sufficient to generate a comprehensive de novo assembled genome and transcriptome of an important bat species. These results will provide useful insights and act as a resource to expand our understanding of bat evolution, ecology, physiology, immunology, viral infection, and transmission dynamics.


September 22, 2019

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data.

The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.


September 22, 2019

A microbial clock provides an accurate estimate of the postmortem interval in a mouse model system.

Establishing the time since death is critical in every death investigation, yet existing techniques are susceptible to a range of errors and biases. For example, forensic entomology is widely used to assess the postmortem interval (PMI), but errors can range from days to months. Microbes may provide a novel method for estimating PMI that avoids many of these limitations. Here we show that postmortem microbial community changes are dramatic, measurable, and repeatable in a mouse model system, allowing PMI to be estimated within approximately 3 days over 48 days. Our results provide a detailed understanding of bacterial and microbial eukaryotic ecology within a decomposing corpse system and suggest that microbial community data can be developed into a forensic tool for estimating PMI. DOI:http://dx.doi.org/10.7554/eLife.01104.001.


September 22, 2019

Survey of Ixodes pacificus ticks in California reveals a diversity of microorganisms and a novel and widespread Anaplasmataceae species.

Ixodes pacificus ticks can harbor a wide range of human and animal pathogens. To survey the prevalence of tick-borne known and putative pathogens, we tested 982 individual adult and nymphal I. pacificus ticks collected throughout California between 2007 and 2009 using a broad-range PCR and electrospray ionization mass spectrometry (PCR/ESI-MS) assay designed to detect a wide range of tick-borne microorganisms. Overall, 1.4% of the ticks were found to be infected with Borrelia burgdorferi, 2.0% were infected with Borrelia miyamotoi and 0.3% were infected with Anaplasma phagocytophilum. In addition, 3.0% were infected with Babesia odocoilei. About 1.2% of the ticks were co-infected with more than one pathogen or putative pathogen. In addition, we identified a novel Anaplasmataceae species that we characterized by sequencing of its 16S rRNA, groEL, gltA, and rpoB genes. Sequence analysis indicated that this organism is phylogenetically distinct from known Anaplasma species with its closest genetic near neighbors coming from Asia. The prevalence of this novel Anaplasmataceae species was as high as 21% at one site, and it was detected in 4.9% of ticks tested statewide. Based upon this genetic characterization we propose that this organism be called ‘Candidatus Cryptoplasma californiense’. Knowledge of this novel microbe will provide awareness for the community about the breadth of the I. pacificus microbiome, the concept that this bacterium could be more widely spread; and an opportunity to explore whether this bacterium also contributes to human or animal disease burden.


September 22, 2019

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5?UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee.© The Authors 2017. Published by Oxford University Press.


September 22, 2019

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

BACKGROUND: High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. RESULTS: In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/. CONCLUSION: Our data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.