New Initiative to Generate 5,000 High-Quality Microbial Genomes for Chinese Database
Wednesday, August 21, 2019
An ambitious project to sequence 5,000 microbial genomes was jointly initiated by a consortium of 10 institutions across China, including Nankai University, China CDC, Academy of Military Medical Science, Third Institute of Oceanography-Ministry of Natural Resources, South China Sea Institute of Oceanology-CAS, China National Center for Food Safety Risk Assessment, Shandong University, Tianjin University of Science & Technology, East China University of Science and Technology, and Tianjin Biochip Corporation (TBC).
TBC, a PacBio service provider in China, has led the sequencing phase of the project, which is expected to be completed by the end of 2019. We recently sat down with Sun Yamin, general manager of TBC, to learn more about the project.
What’s the difference between the Prokaryotes 5,000 Complete Genomes Project (P5KCGP) and other microbial sequencing projects?
Previous microbial genome projects were scattered and typically based on one researcher’s own interests and directions. As a result, many common microbial species’ genomes have been sequenced repeatedly, while less commonly studied microbial species have still not been sequenced at all.
The current microbial genome database has an obvious species imbalance. Many microbial genomes have only low-quality genomic scaffolds. Our goal is to create a genomic database that covers a much broader array of microbial diversity, including pathogenic microorganisms, food safety microbes, marine microbes, and terrestrial resource microbes.
We expected to add at least 500 new microbial genomes that are currently not found in the NCBI database by the completion of the project. Our goal is to submit a high-quality, closed genome with no gaps for each of the 5,000 microbial genomes included in our project. In order to achieve this goal, we chose the PacBio Sequel System as our sequencing platform, as SMRT Sequencing technology combines long read lengths, high accuracy, and no GC content bias.
At present, only the Sequel System can meet our project requirements, given the challenges presented by many bacterial genomes. Using the latest version 3.0 reagents, the average read length of 22 kb on the Sequel System is sufficient to span repeats that can be more than a dozen kilobases in length in some bacterial genomes. In addition, we have seen GC content up to 70% in microbial samples we’ve sequenced. Even so, assembly can be accomplished easily with PacBio data.
What is the significance of the P5KCGP project?
While microorganisms were the first genomes to be sequenced by scientists, the sum of all microbial sequencing data worldwide is less than the amount of data produced by a laboratory that performs human genome sequencing. Although the genomes of microorganisms are relatively small, the enormous species and functional diversity of microorganisms in nature means that microbial genomics has not been given sufficient attention. For pathogenic and foodborne microorganisms in particular, it is important to have reference-quality genomes.
What challenges has the P5KCGP project encountered?
1) Sample collection. On average, each partner needs to provide 400-500 microorganisms. Since our goal was to include bacterial species that are rare in nature, it can take a long time to isolate and grow samples.
2) Controlling costs. Generating closed microbial genomes requires more resources than simply coming up with a bunch of draft genomes. To manage sequencing costs, we have succeeded in multiplexing 16 microbial samples on each SMRT Cell 1M by optimizing the library preparation process.
3) Dealing with difficult-to-sequence microbes. The habitat of microorganisms in nature is diverse, and some live in extreme environments requiring quite high GC content in their genomes. Sequencing of such microbial genomes is more difficult.
What groundwork does this project lay for future research efforts?
We want to better understand how microbes that are widely distributed in nature have evolved and adapted to diverse environments with the much more complete survey of microbial genomes made available through this sequencing project. In addition, some rare microorganisms living in extreme environments often have potential industrial value. Two examples sequenced through this project are the extremely acidophilic methanotroph isolate V4, Methylacidiphilum infernorum, and the Geobacillus thermodenitrificans.