A paper in BioMed Central’s Biotechnology for Biofuels journal demonstrates how finished microbial genomes using Single Molecule, Real-Time (SMRT®) Sequencing are having an impact on the biotechnology industry.
The publication, “Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia,” comes from scientists at Oak Ridge National Laboratory, the University of Tennessee, and New Zealand-based biofuels company LanzaTech. Lead authors Steven Brown and Shilpa Nagaraju and their colleagues used PacBio® sequencing to generate a finished genome sequence for a complex class III microbe that previously could not be assembled to closure.
Clostridium autoethanogenum (strain JA1-1; DSM10061) is an acetogen that can ferment waste gases such as carbon monoxide into biofuels and commodity chemicals, so it is of considerable interest to the biotech industry. Its genome has one chromosome of about 4.3 Mb and very low GC content of just 31%. The strain is categorized as a class III microbe, indicating that it is difficult to assemble due to its high repeat content, prophage, and multiple copies of the rRNA gene operons. Before this study, a draft genome assembly had been published with 100 contigs.
In this project, the authors used various sequencing technologies in an attempt to improve on that draft assembly. They were unsuccessful using short-read sequencing technologies. “Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb,” the scientists report.
But it was a different story when they tried SMRT Sequencing on the PacBio RS II. “Remarkably, one PacBio library preparation and two single molecule real-time sequencing (SMRT) cells produced sufficient sequence such that it could be assembled into one contiguous DNA fragment that represented the DSM 10061 genome,” the authors write. “This is one of the first de novo sequenced genomes we are aware of that has been closed without manual finishing or additional data, despite the complexity of the C. autoethanogenum genome.”
In comparing the PacBio assembly to earlier efforts with short-read technologies, the scientists found many fully sequenced genes that were missed entirely or only partially covered with draft assemblies. Some of these genes proved important for understanding the detailed metabolism of the organism and enabled a more comprehensive comparison to a closely related Clostridium strain. The team also used several analysis tools to test the accuracy of the final assembly. One of these checked for collapsed repeats, finding none in the PacBio assembly but several in assemblies using short-read sequence data. Another assessment found that while the previously published 100-contig draft genome predicted a single copy of the 16S rRNA gene, the PacBio assembly predicted nine copies — the same number of rRNA clusters in a closely related Clostridium strain. The authors note that these very large repetitive regions likely contributed to the inability of short-read technologies to fully sequence the organism.
Another major finding was the presence of a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system in this microbial strain. CRISPRs are prokaryotic DNA loci that carry the memory of past bacterial infections of phages and plasmids to provide immunity against mobile genetic elements. Closely related strains do not have the CRISPR system, and other related strains used in industrial fermentation that lack CRISPR systems have proven susceptible to bacteriophage infections. In this paper, scientists reason that the presence of a CRISPR system may make this strain particularly successful for industrial-scale fermentation of biotech products.
The authors add, “The relatively low cost to generate the PacBio data (approximately US$1,500) and the outcome of this study support the assertion this technology will be valuable in future studies where a complete genome sequence is important and for complex genomes that contain large repeat elements.”