As the flurry of research around the SARS-CoV-2 virus continues at an unprecedented pace, scientists are beginning to tackle some of the more complex immunological responses with the help of Single Molecule, Real-Time (SMRT) sequencing.
Hundreds of people tuned in live to a special May 7 webinar, “Understanding SARS-CoV-2 and host immune response to COVID-19 with PacBio sequencing.”
Meredith Ashby, Director of Microbial Genomics at PacBio, described some of the resources being generated by both PacBio and our users in order to help labs who are using SMRT Sequencing technology to investigate SARS-CoV-2 and COVID-19.
These include two microbial sequencing protocols — the Mt. Sinai 1.5 kb and 2 kb amplicon protocol and the Eden 2.5 kb protocol — as well as guidelines for target-based capture using IDT probes and M13 tag barcoding.
PacBio scientists have been verifying these protocols and rebalancing primers to improve evenness of coverage, Ashby said. Results and additional resources are being continuously added to a resource page on our website, she added. Since the webinar, PacBio scientists have finished optimizing the Eden protocol for SMRT Sequencing. Modified primers that enable barcoding have been validated. In addition, the modified workflow now has two multiplex PCR reactions, as opposed to the 14 individual PCR reactions in the original version. Finally, the primer concentrations were rebalanced to reduce coverage variance to approximately 8-fold, enabling higher multiplexing of samples.
Personal profiling: HLA Typing
In order to attack the enormous range of pathogens our bodies are exposed to, our immune systems have an agile arsenal. Among them are the human leukocyte antigens, or HLA. They interact with our T Cell receptors to activate and modulate our immune responses, and the HLA loci are some of the most polymorphic regions in the human genome.
There’s a huge diversity of HLA alleles across the population, but also mixed and matched within each individual. Melissa Laird-Smith (@lissagoingviral), assistant director of technology development at the Icahn School of Medicine at Mount Sinai in New York City, discussed why it is so important to understand personal HLA profiles, and how complicated it is to do so.
She focused on two particular classes of HLA alleles: Class I, expressed by most nucleated cells, which presents antigens to CD8+ T cells and can trigger cytotoxic T cell activity to clear any pathogen considered foreign, or ‘non self’; and Class II, expressed by macrophages, dendritic cells and B cells, which presents antigens to CD4+ cells, triggering ‘helper’ responses such as cytokine production and antibody production by B cells (illustrated, right).
This activity relies on very specific binding, however, with HLA allele puzzle pieces only fitting into certain matching antigens.
“If an HLA allele-matched antigen combination is not available, T cells cannot recognize and activate a response,” Smith said. “So this idea that specific HLA alleles are important for mounting the appropriate immune response is critically important when you’re thinking about how HLA modulation can control response to viral infection.”
In terms of COVID-19, the range of responses observed in individual patients suggests these immune interactions play a critical role, and that elucidating it will be crucial to determining what it will take to protect us via a vaccine, she said.
She then shared details of the high-resolution HLA typing protocol the Mt. Sinai team developed. Combining PacBio sequencing with the commercially available and validated HLA NGSGo kit offered by GenDx, the Mt. Sinai protocol generates full-length genes, ranging from 3 to 6 kb, from 200 ng of high quality genetic DNA.
“The advantage of long-read sequencing is that you go around a smaller amplicon multiple times, which allows you to collapse these reads and generate a highly accurate, intramolecular consensus, or HiFi read,” Smith said.
“When you map these HiFi reads to the full-length sequence of the HLA molecules, you get coverage that allows you to phase the two alleles at each position within an individual without imputation or bioinformatic reconstruction,” she added.
Watch Laird-Smith’s full presentation:
Analyzing antibodies: IGH haplotype diversity
Immunoglobulins are also critical components of the immune system that are highly variable between individuals.
Sequencing has enabled scientists to interrogate variation in the expressed antibodies within human populations, but what hasn’t been as readily explored is how haplotype diversity within the antibody gene regions themselves contributes to the story, said Corey Watson (@cwatson29), assistant professor at the University of Louisville School of Medicine.
“A lack of genomic resources has hindered our ability to understand the role of IG genetic variation in phenotypic data and disease,” Watson said. “If we can understand some of these genetic determinants to the repertoire, then we can better define — or maybe even predict — what a repertoire might look like in a given person, and understand better how these signatures associate with functional responses.”
The immunoglobulin heavy chain (IGH) gene loci, located on chromosome 14, are extremely complex and poorly characterized at both the genomic and population level.
In addition to functional genes, these areas also include pseudogenes. In the entirety of the V, D and J region, there are about 130-150 genes within a given haplotype, making the locus incredibly repetitive. In fact, more than 50% of the sequence within this 1 Mb region falls within a segmental duplication. There are also many structural and copy number variants. These can be quite large and contain multiple genes, ranging from 9-60kb. This means that any given haplotype can vary by tens of genes when compared to another.
“This is a challenge when thinking about doing genomics, particularly when thinking about how we utilize existing reference assemblies,” Watson said. “In the case of the two reference assemblies that we have currently, both of these missed genes, or don’t include genes that are known to occur in the human population.”
Yet, despite the fact that we now know that these regions are complex and that there is variation in the population, we have a poor set of tools with which to interrogate them, Watson said. This can have far-reaching impacts, he added.
Microarray technologies, for example, often do a poor job of tagging variation within the IGH loci — in one study, less than 50% of variants were tagged effectively.
“The implications of this, you could imagine, especially for immune system disorders, would be great, given the number of GWASes that have been conducted over the years,” Watson said.
To address the problem, Watson’s lab has:
- Done benchmarking in haploid cell lines with orthogonal BAC assembly data
- Used this data to develop a probe set for robust target capture of the IGH loci for PacBio HiFi sequencing
- Developed a pipeline for IGH locus assembly, variant calling and annotation, IGenotyper
- Begun conducting targeted IGH sequencing at scale for comprehensive high-throughput genotyping
“One reference simply is not enough,” Watson said. “Our goal is to create a pipeline that can generate haplotype-specific assemblies leveraging PacBio HiFi reads. Just by leveraging the longer PacBio reads, we are able to gain access to parts of the locus that previously were not being interrogated very well or accurately.”
Watch Watson’s full presentation:
Watch the full webinar:
You may also be interested in:
Blog Post: Why Are Long Reads Important for Studying Viral Genomes?
Press Release: Pacific Biosciences Powers SARS-CoV-2 Research at Commercial, Academic and Government Labs
Blog Post: In Battle Against COVID-19 Pandemic, Scientists Turn to PacBio Sequencing
Webinar: Opportunities for using PacBio Long-read Sequencing for COVID-19 Research
Video: Nucleotide Resolution of Structural Variants: PacBio Succeeds; MiSeq fails