The SARS-CoV-2 global pandemic has highlighted the utility of pathogen surveillance pipelines that provide comprehensive genomic information, giving public health scientists a more complete view of the spread and characteristics of circulating pathogens. Beyond COVID-19, there is great interest in public health to expand high resolution surveillance to other infectious diseases.
Highly accurate, long HiFi reads produced by the PacBio Sequel IIe System have brought new levels of contiguity, completeness, and accuracy to large genome assembly. HiFi reads are similarly beneficial for microbial genome assembly, as higher quality assemblies enhance our ability to investigate foodborne illnesses and monitor antimicrobial resistance. However, obstacles in library preparation workflow, cost, and recovery of small plasmids have limited use in public health. Here, we introduce a new library prep workflow and assembly algorithm based on HiFi reads that enables a high throughput, end-to-end solution for microbial genome assembly. The new workflow combines steps, eliminates the need for strict size selection, shortens the total time to 6 hours, and enables library prep automation. The assembly algorithm uses strict read-to-read overlaps enabled by HiFi read accuracy to resolve repeats. It uses a two-stage approach to first assemble chromosomes and then recover short, high-coverage plasmids.
To evaluate the method, a pool of HiFi libraries with 96 microbial samples and total genome size of 375 Mb was generated. The protocol was evaluated with microbes relevant to pathogen surveillance including common foodborne pathogens (Listeria, Salmonella) and species often seen in hospital settings (Klebsiella, Staphylococcus). The microbes represent a range of genome sizes, assembly complexity, GC content, chromosome counts, and plasmid content. DNA samples were sheared to 7 kb – 10 kb, prepared as barcoded libraries, pooled, and sequenced on one SMRT Cell 8M on the Sequel IIe System.
Reference quality de novo microbial assemblies with 5 contigs or fewer were achieved for all samples. Typical chromosome assembly quality was Q50, measured as concordance to reference assemblies. Nearly all plasmids were recovered, including those shorter than 5 kb which are often lost in workflows with strict size selection. Taken together, the new method provides a high-throughput, cost-effective approach suitable to routinely generate reference quality microbial genomes in a public health environment as part of a pathogen surveillance program.