The COVID-19 pandemic has brought a sudden urgency to virus research and led many of us to dig more deeply into all the tools available for characterizing viral genomes, from RT-PCR to DNA sequencing. For all their outsized impact on human health, viruses have remarkably small and simple genomes, some just a few thousand bases in length, and most lacking any repetitive structures. With such tidy genomes, you may wonder, why would scientists want to sequence them with a long-read technology like PacBio HiFi reads?
While it is true that most viral genomes do not require long reads for assembly, viruses exist as populations within infected hosts, and long reads are a powerful tool for fully characterizing these populations. Depending on the mutation rate of the virus, the population structure can have one dominant variant with only a very small proportion of rare variants, or it can be comprised of a highly diverse set of closely-related variants, called a quasispecies.
Highly accurate, single-molecule sequencing allows researchers to fully characterize all the variants within a viral population, as opposed to just the dominant variant. Oftentimes this more detailed view of a complex population of viruses reveals important aspects of biology, including how a viral infection evolves over time or in response to therapeutics.
Here are some examples of how scientists have used SMRT Sequencing to explore viral genomes, and the highlights of their work.
Influenza: While the influenza virus continues to evade efforts to produce a universal vaccine, HiFi sequencing has given scientists a clearer picture of the dynamics of flu virus evolution. In one study, researchers sequenced multiple samples from a single patient with a two-year recurrent infection and revealed in great detail how the virus adapted in response to flu treatments. In another population-scale study tracking a flu pandemic in Hong Kong, PacBio sequencing revealed that transmission was enabled in part by minor strain variants.
In addition, scientists have used long reads to analyze large deletions in the influenza genome that characterize viruses incapable of replicating. Most recently, a study combining PacBio long reads and single-cell sequencing gave a comprehensively detailed view of how influenza mutations evolved over the course of a typical infection, revealing both point mutations and indels.
Hepatitis C virus: Among the most significant challenges in HCV treatment is drug resistance. To better understand how resistance arises, scientists have deployed SMRT sequencing to study HCV evolution in individuals who failed to respond to antiviral therapy. By using HiFi sequencing to obtain single reads encompassing entire clones, they were able to detail how multid rug resistant variants arose from low-abundance, drug-resistant clones present at baseline.
In another example, researchers generated long-read data to produce full-length HCV envelope sequences, which allowed them to track the transmission path for a sexually transmitted cluster of HCV infections. They also reported how viral genetic diversity changed over the course of an infection, appearing low during the acute stage but increasing over time.
HIV: An ongoing area of research in HIV is how the virus evolves and persists in patients on long-term antiretroviral therapy (ART). With SMRT sequencing, scientists have generated full-length viral genome sequences from proviruses to study what proportion the latent reservoir is replication competent, and what types of mutations are favored in this reservoir under ART. A separate effort involved analyzing proviral sequences in the brain and other tissues to understand HIV-associated dementia. HiFi reads allowed the researchers to create a detailed phylogenetic tree of all the variants within an individual, and revealed that variants in the brain were distinct in important ways and absent from other parts of the body.
SARS-CoV-2: Recently, researchers at Mt. Sinai published a study using genetic drift in the SARS-CoV-2 virus to determine when and how the virus arrived in New York City. They found that the virus had arrived from Europe and the West Coast multiple times, though not from China, and it arrived significantly earlier than recognized. In another study, Dr. Eli Boritz and colleagues at the National Institute of Allergy and Infectious Disease developed a method to sequence a 6.1 kb SARS-CoV-2 amplicon encompassing the S,E, and M genes, allowing them to chart the interplay between viral evolution and the host immune response over time.
To learn more about how to sequence viral genomes, explore our COVID-19 sequencing tools and resources or review our sample prep and analysis workflows for resolving viral populations.
Explore other posts in the Sequencing 101 series:
The evolution of DNA sequencing tools
Introduction to PacBio sequencing and the Sequel II system
From DNA to discovery — the steps of SMRT sequencing
Looking beyond the single reference genome to a pangenome for every species
Understanding accuracy in DNA sequencing
The value of sequencing full-length RNA transcripts of DNA transcripts
Ploidy, haplotypes, and phasing — how to get more from your sequencing data
DNA extraction — tips, kits, and protocols
Video: Sequencing 101: how long-read sequencing improves access to genetic information