DNA metabarcoding is widely used to study prokaryotic and eukaryotic microbial diversity. Technological constraints limit most studies to marker lengths below 600 base pairs (bp). Longer sequencing reads of several thousand bp are now possible with third-generation sequencing. Increased marker lengths provide greater taxonomic resolution and allow for phylogenetic methods of classification, but longer reads may be subject to higher rates of sequencing error and chimera formation. In addition, most bioinformatics tools for DNA metabarcoding were designed for short reads and are therefore unsuitable. Here, we used Pacific Biosciences circular consensus sequencing (CCS) to DNA-metabarcode environmental samples using a ca. 4,500 bp marker that included most of the eukaryote SSU and LSU rRNA genes and the complete ITS region. We developed an analysis pipeline that reduced error rates to levels comparable to short-read platforms. Validation using a mock community indicated that our pipeline detected 98% of chimeras de novo. We recovered 947 OTUs from water and sediment samples from a natural lake, 848 of which could be classified to phylum, 397 to genus and 330 to species. By allowing for the simultaneous use of three databases (Unite, SILVA and RDP LSU), long-read DNA metabarcoding provided better taxonomic resolution than any single marker. We foresee the use of long reads enabling the cross-validation of reference sequences and the synthesis of ribosomal rRNA gene databases. The universal nature of the rRNA operon and our recovery of >100 nonfungal OTUs indicate that long-read DNA metabarcoding holds promise for studies of eukaryotic diversity more broadly.© 2018 John Wiley & Sons Ltd.
Journal: Molecular ecology resources