Background: To better understand the relationships among HIV-1 viruses in linked transmission pairs, we sequenced several samples representing HIV transmission pairs from the Zambia Emory HIV Research Project (Lusaka, Zambia) using Single Molecule, Real-Time (SMRT) Sequencing. Methods: Single molecules were sequenced as full-length (9.6 kb) amplicons directly from PCR products without shearing. This resulted in multiple, fully-phased, complete HIV-1 genomes for each patient. We examined Single Genome Amplification (SGA) prepped samples, as well as samples containing complex mixtures of genomes. We detail mathematical techniques used in viral variant subspecies identification, including clustering distance metrics and mutual information, which were used to derive multiple de novo full-length genome sequences for each patient. Whole genome consensus estimates for each sample were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV-1 genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. Results: SMRT Sequencing data contained multiple full-length (>9 kb) continuous reads for each sample. Simple whole-genome consensus estimates easily identified transmission pairs. Clustering of genome reads showed diversity differences between samples, allowing characterization of the quasi-species diversity comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.
Background: Genotypic testing of chronic viral infections is an important part of patient therapy and requires assays capable of detecting the entire spectrum of viral mutations. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to other sequencing technologies, including superior resolution of mixed populations and long read lengths capable of spanning entire viral protein coding regions. We examined detection sensitivity of SMRT sequencing using a mixture of HIV-1 RT gene coding regions containing single NNRTI mutations. Methodology: SMRTbell templates were prepared from PCR products generated from a prospective reference material being developed by BC Center of Excellence for HIV/AIDS, and contained a mixture of fifteen infectious viruses containing single NNRTI resistance mutations (viz V90I, K101E, K103N, V108I, E138A/G/K/Q, V179D, Y181C, Y188C, G190A/S, M230L and P236L) built upon the HIV-1LAI molecular clone. Templates were sequenced on the PacBio RS II to obtain single molecule long reads using P4/C2 chemistry, using 180 minute movie collection without stage start. The relative abundances of the mutant viruses were then estimated using codon-aware analysis methods. Results: Sequencing of these templates produced average read lengths of 5.0 KB, comprising 40,000-fold coverage across the entire amplicon per SMRT Cell. All the expected mutations in the mixture of mutant viruses were accurately identified. Frequencies of NNRTI variants estimated ranged from 0.5% to 12.5%. Conclusions: Codon analysis revealed a number of variants across the amplicon with highly consistent results across SMRT Cells. From a single SMRT Cell, variants were accurately and reliably detected down to 0.5% with simple analyses. Long polymerase reads and high accuracy reads make it possible to call variants from just a few molecules. SMRT Sequencing can identify species comprising a mixed viral population, with granularity and low cost of consumables allowing for smaller multiplexing of samples and first-in-first-out processing.
Background: The use of next generation sequencing (NGS) to examine circulating HIV env variants has been limited due to env’s length (2.6 kb), extensive indel polymorphism, GC deficiency, and long homopolymeric regions. We developed and standardized protocols for isolation, RT-PCR amplification, single molecule real-time (SMRT) sequencing, and haplotype analysis of circulating HIV-1 env variants to evaluate viral diversity in primary infection. Methodology: HIV RNA was extracted from 7 blood plasma samples (1 mL) collected from 5 subjects (one individual sampled and sequenced at 3 time points) in the San Diego Primary Infection Cohort between 3-33 months from their estimated date of infection (EDI). Median viral load per sample was 50,118 HIV RNA copies/mL (range: 22,387-446,683). Full-length (3.2 kb) env amplicons were constructed into SMRTbell templates without shearing, and sequenced on the PacBio RS II using P4/C2 chemistry and 180 minute movie collection without stage start. To examine viral diversity in each sample, we determined haplotypes by clustering circular consensus sequences (CCS), and reconstructing a cluster consensus sequence using a partial order alignment approach. We measured sample diversity both as the mean pairwise distance among reads, and the fraction of reads containing indel polymorphisms. Results: We collected a median of 8,775 CCS reads per SMRT Cell (range: 4243-12234). A median of 7 haplotypes per subject (range: 1-55) were inferred at baseline. For the one subject with longitudinal samples analyzed, we observed an increasing number of distinct haplotypes (8 to 55 haplotypes over the course of 30 months), and an increasing mean pairwise distance among reads (from 0.8% to 1.6%, Tamura-Nei 93). We also observed significant indel polymorphism, with 16% of reads from one sample later in infection (33 months post-EDI) exhibiting deletions of more than 10% of env with respect to the reference strain, HXB2. Conclusions: This study developed a standardized NGS procedure (PacBio SMRT) to deep sequence full-length HIV RNA env variants from the circulating viral population, achieving good coverage, confirming low env diversity during primary infection that increased over time, and revealing significant indel polymorphism that highlights structural variation as important to env evolution. The long, accurate reads greatly simplified downstream bioinformatics analyses, especially haplotype phasing, increasing our confidence in the results. The sequencing methodology and analysis tools developed here could be successfully applied to any area for which full-length HIV env analysis would be useful.
Background: HIV-1 proviruses in peripheral blood mononuclear cells (PBMCs) are felt to be an important reservoir of HIV-1 infection. Given that this pool represents an archival library, it can be used to study virus evolution and CD4+ T cell survival. Accurate study of this pool is burdened by difficulties encountered in sequencing a full-length proviral genome, typically accomplished by assembling overlapping pieces and imputing the full genome. Methodology: Cryopreserved PBMCs collected from a total of 8 HIV+ patients from 1997-2001 were used for genomic DNA extraction. Patients had been receiving cART for 2-8 years at the time samples were obtained. 7 patients had pVL >50 copies/mL (mean: 312,282, range: 18,372-683,400) and 1 had pVL <50. Genomic DNA was subjected to limiting dilution prior to amplification of near-full-length genomes by a newly developed nested PCR. The predicted size of the PCR product was 9.0 kb, spanning from the 5’ LTR through the 3’ LTR. Single molecules were sequenced as near-full-length amplicons directly from PCR products without shearing using commercially available P4-C2 reagents and standard protocols on a PacBio RS II instrument. Quality of the genomes was validated by clonal positive controls and synthetic mixtures. Results: Near-full-length provirus genome sequences were successfully obtained from all 8 patients as continuous long reads from single molecules. PacBio sequencing required approximately 10% of the PCR product needed for Sanger sequencing and generated 325 MB per 3-hour run including 1,800 full-length intact genome reads on average. One patient’s sample was not at a limiting dilution and analysis revealed multiple subspecies. For 8 near-fulllength provirus genomes derived from the other 7 patients, large internal deletions were noted in 2 proviruses; APOBEC-mediated hypermutations were seen in 2 proviruses; and 4 proviruses appeared to be intact genomes. All of the defective proviruses showed a complete absence of resistance mutations in either RT or protease, even after 2-8 years of cART. On the contrary, all of the intact proviruses contained evidence of ART-resistance associated mutations suggesting that they represented relatively recent variants. Conclusions: Combining a novel protocol for full-length limiting dilution amplification of proviruses with PacBio SMRT sequencing allowed for the generation of near-full-length genomes with good quality and an ability to detect minor variants at the 1-10% level. Preliminary data analyses suggest that defective proviruses may represent archival variants that persist long-term in host cells, while intact proviruses within the PBMC pool showing evidence of active virus replication may represent more recent variants.
Background: Understanding the co-evolution of HIV populations and broadly neutralizing antibody (bNAb) lineages may inform vaccine design. Novel long-read, next-generation sequencing methods allow, for the first time, full-length deep sequencing of HIV env populations. Methods: We longitudinally examined env populations (12 time points) in a subtype A infected individual from the IAVI primary infection cohort (Protocol C) who developed bNAbs (62% ID50>50 on a diverse panel of 105 viruses) targeting the V1/V2 region. We developed a Pacific Biosciences single molecule, real-time sequencing protocol to deeply sequence full-length env from HIV RNA. Bioinformatics tools were developed to align env sequences, infer phylogenies, and interrogate escape dynamics of key residues and glycosylation sites. PacBio env sequences were compared to env sequences generated through amplification and cloning. Env dynamics were interpreted in the context of the development of a V1/V2-targeting bNAb lineage isolated from the donor. Results: We collected a median of 6799 high quality full-length env sequences per timepoint (median per-base accuracy of 99.7%). A phylogeny inferred with PacBio and 100 cloned env sequences (10 time points) found cloned env sequences evenly distributed among PacBio sequences. Phylogenetic analyses also revealed a potential transient intra-clade superinfection visible as a minority variant (~5%) at 9 months post-infection (MPI), and peaking in prevalence at 12MPI (~64%), just preceding the development of heterologous neutralization. Viral escape from the bNAb lineage was evident at V2 positions 160, 166, 167, 169 and 181 (HxB2 numbering), exhibiting several distinct escape pathways by 40MPI. Conclusions: Our PacBio full-length env sequencing method allowed unprecedented characterization of env dynamics and revealed an intra-clade superinfection that was not detected through conventional methods. The importance of superinfection in the development of this donor’s V1/V2-directed bNAb lineage is under investigation. Longitudinal full-length env deep sequencing allows accurate phylogenetic inference, provides a detailed picture of escape dynamics in epitope regions, and can identify minority variants, all of which may prove useful for understanding how env evolution can drive the development of antibody breadth.