Video Poster: Long-read sequencing of the SARS-CoV-2 genome and the human immune repertoire
COVID-19 is caused by the infection of SARS-CoV-2, a member of the coronavirus family. Complete and accurate sequencing of the SARS-CoV-2 genome enables discovery and epidemiological tracing of mutations that may be important for antiviral and vaccine research. A complementary approach, sequencing the patients’ immune repertoire, allows for detection of neutralizing antibodies and understanding variation in the adaptive immune response. PacBio’s SMRT Sequencing uses circular consensus sequencing that can generate long, highly accurate (HiFi) reads. We find that a tiled multiplex PCR amplicon approach of ~1-2 kb fragments achieves a balanced tradeoff between ease of library preparation and robustness to RNA quality. When mapped against the reference genome, SNPs can be called accurately at as low as 20-fold coverage, meaning that 500+ patient samples can be multiplexed and sequenced on a single SMRT Cell 8M. Higher coverage is required to identify minor variants. Testing different variant callers, we found PacBio’s own MinorVariant (originally developed for HIV quasi-species detection) had the highest sensitivity and specificity. In addition to sequencing the entire 30 kb of the viral genome, we also validated two alternative methods: sequencing only the full-length spike (S) gene in a single 4 kb amplicon, and using probe hybridization to enrich for viral RNAs. The former approach can be utilized to monitor S gene mutations in a high-throughput manner, while the latter approach is robust to sample quality and can readily identify sub-genomic viral RNA fusion events. Sequencing the host immune response in recovered patients is another focal point of COVID-19 research. Here, we show how using a single SMRT Cell 8M with an off-the-shelf Takara SMARTer Human BCR IgG IgM H/K/L Profiling Kit, the PacBio data identifies largely the same clonal types as matching Illumina libraries. However, using an alternative IgG subtype primer that generates a >600 bp amplicon that exceeds the capabilities of PE assembly enables the co-determination of IgG subclass information. Furthermore, the high accuracy of the HiFi reads eliminates the need for IMGT database-assisted assembly; all sequences are generated de novo. Our work demonstrates the feasibility of using PacBio on recovered patients’ peripheral blood lymphocytes RNA for the purpose of identifying neutralizing antibodies.