June 1, 2021  |  

Complete HIV-1 genomes from single molecules: Diversity estimates in two linked transmission pairs using clustering and mutual information.

We sequenced complete HIV-1 genomes from single molecules using Single Molecule, Real- Time (SMRT) Sequencing and derive de novo full-length genome sequences. SMRT sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. We detail mathematical techniques used in viral variant subspecies identification including clustering distance metrics and mutual information. Sequencing was performed in order to better understand the relationships between the specific sequences of transmitted viruses in linked transmission pairs. Samples representing HIV transmission pairs were selected from the Zambia Emory HIV Research Project (Lusaka, Zambia) and sequenced. We examine Single Genome Amplification (SGA) prepped samples and samples containing complex mixtures of genomes. Whole genome consensus estimates for each of the samples were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. SMRT Sequencing data contained multiple full-length (greater than 9 kb) continuous reads for each sample. Simple whole genome consensus estimates easily identified transmission pairs. The clustering of the genome reads showed diversity differences between the samples, allowing us to characterize the diversity of the individual quasi-species comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

Rapid sequencing of HIV-1 genomes as single molecules from simple and complex samples.

Background: To better understand the relationships among HIV-1 viruses in linked transmission pairs, we sequenced several samples representing HIV transmission pairs from the Zambia Emory HIV Research Project (Lusaka, Zambia) using Single Molecule, Real-Time (SMRT) Sequencing. Methods: Single molecules were sequenced as full-length (9.6 kb) amplicons directly from PCR products without shearing. This resulted in multiple, fully-phased, complete HIV-1 genomes for each patient. We examined Single Genome Amplification (SGA) prepped samples, as well as samples containing complex mixtures of genomes. We detail mathematical techniques used in viral variant subspecies identification, including clustering distance metrics and mutual information, which were used to derive multiple de novo full-length genome sequences for each patient. Whole genome consensus estimates for each sample were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV-1 genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. Results: SMRT Sequencing data contained multiple full-length (>9 kb) continuous reads for each sample. Simple whole-genome consensus estimates easily identified transmission pairs. Clustering of genome reads showed diversity differences between samples, allowing characterization of the quasi-species diversity comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

A comparison of 454 GS FLX Ti and PacBio RS in the context of characterizing HIV-1 intra-host diversity.

PacBio 2013 User Group Meeting Presentation Slides: Lance Hepler from UC San Diego’s Center for AIDS Research used the PacBio RS to study intra-host diversity in HIV-1. He compared PacBio’s performance to that of 454® sequencer, the platform he and his team previously used. Hepler noted that in general, there was strong agreement between the platforms; where results differed, he said that PacBio data had significantly better reproducibility and accuracy. “PacBio does not suffer from local coverage loss post-processing, whereas 454 has homopolymer problems,” he noted. Hepler said they are moving away from using 454 in favor of the PacBio system.


June 1, 2021  |  

Characterization of NNRTI mutations in HIV-1 RT using Single Molecule, Real-Time SMRT Sequencing.

Background: Genotypic testing of chronic viral infections is an important part of patient therapy and requires assays capable of detecting the entire spectrum of viral mutations. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to other sequencing technologies, including superior resolution of mixed populations and long read lengths capable of spanning entire viral protein coding regions. We examined detection sensitivity of SMRT sequencing using a mixture of HIV-1 RT gene coding regions containing single NNRTI mutations. Methodology: SMRTbell templates were prepared from PCR products generated from a prospective reference material being developed by BC Center of Excellence for HIV/AIDS, and contained a mixture of fifteen infectious viruses containing single NNRTI resistance mutations (viz V90I, K101E, K103N, V108I, E138A/G/K/Q, V179D, Y181C, Y188C, G190A/S, M230L and P236L) built upon the HIV-1LAI molecular clone. Templates were sequenced on the PacBio RS II to obtain single molecule long reads using P4/C2 chemistry, using 180 minute movie collection without stage start. The relative abundances of the mutant viruses were then estimated using codon-aware analysis methods. Results: Sequencing of these templates produced average read lengths of 5.0 KB, comprising 40,000-fold coverage across the entire amplicon per SMRT Cell. All the expected mutations in the mixture of mutant viruses were accurately identified. Frequencies of NNRTI variants estimated ranged from 0.5% to 12.5%. Conclusions: Codon analysis revealed a number of variants across the amplicon with highly consistent results across SMRT Cells. From a single SMRT Cell, variants were accurately and reliably detected down to 0.5% with simple analyses. Long polymerase reads and high accuracy reads make it possible to call variants from just a few molecules. SMRT Sequencing can identify species comprising a mixed viral population, with granularity and low cost of consumables allowing for smaller multiplexing of samples and first-in-first-out processing.


June 1, 2021  |  

Full-length HIV-1 env deep sequencing in a donor with broadly neutralizing V1/V2 antibodies.

Background: Understanding the co-evolution of HIV populations and broadly neutralizing antibodies (bNAbs) may inform vaccine design. Novel long-read, next-generation sequencing methods allow, for the first time, full-length deep sequencing of HIV env populations. Methods: We longitudinally examined HIV-1 env populations (12 time points) in a subtype A infected individual from the IAVI primary infection cohort (Protocol C) who developed bNAbs (62% ID50>50 on a diverse panel of 105 viruses) targeting the V1/V2 loop region. We developed a PacBio single molecule, real-time sequencing protocol to deeply sequence full-length env from HIV RNA. Bioinformatics tools were developed to align env sequences, infer phylogenies, and interrogate escape dynamics of key residues and glycosylation sites. PacBio env sequences were compared to env sequences generated through amplification and cloning. Env dynamics and viral escape motif evolution were interpreted in the context of the development V1/V2-targeting broadly neutralizing antibodies. Results: We collected a median of 6799 (range: 1770-14727) high quality full-length HIV env circular consensus sequences (CCS) per SMRT Cell, per time point. Using only CCS reads comprised of 6 or more passes over the HIV env insert (= 16 kb read length) ensured that our median per-base accuracy was 99.7%. A phylogeny inferred with PacBio and 100 cloned env sequences (10 time points) found the cloned sequences evenly distributed among PacBio sequences. Viral escape from the V1/V2 targeted bNAbs was evident at V2 positions 160, 166, 167, 169 and 181 (HxB2 numbering), exhibiting several distinct escape pathways by 40 months post-infection. Conclusions: Our PacBio full-length env sequencing method allowed unprecedented view and ability to characterize HIV-1 env dynamics throughout the first four years of infection. Longitudinal full-length env deep sequencing allows accurate phylogenetic inference, provides a detailed picture of escape dynamics in epitope regions, and can identify minority variants, all of which will prove critical for increasing our understanding of how env evolution drives the development of antibody breadth.


June 1, 2021  |  

Full-length env deep sequencing in a donor with broadly neutralizing V1/V2 antibodies.

Background: Understanding the co-evolution of HIV populations and broadly neutralizing antibody (bNAb) lineages may inform vaccine design. Novel long-read, next-generation sequencing methods allow, for the first time, full-length deep sequencing of HIV env populations. Methods: We longitudinally examined env populations (12 time points) in a subtype A infected individual from the IAVI primary infection cohort (Protocol C) who developed bNAbs (62% ID50>50 on a diverse panel of 105 viruses) targeting the V1/V2 region. We developed a Pacific Biosciences single molecule, real-time sequencing protocol to deeply sequence full-length env from HIV RNA. Bioinformatics tools were developed to align env sequences, infer phylogenies, and interrogate escape dynamics of key residues and glycosylation sites. PacBio env sequences were compared to env sequences generated through amplification and cloning. Env dynamics were interpreted in the context of the development of a V1/V2-targeting bNAb lineage isolated from the donor. Results: We collected a median of 6799 high quality full-length env sequences per timepoint (median per-base accuracy of 99.7%). A phylogeny inferred with PacBio and 100 cloned env sequences (10 time points) found cloned env sequences evenly distributed among PacBio sequences. Phylogenetic analyses also revealed a potential transient intra-clade superinfection visible as a minority variant (~5%) at 9 months post-infection (MPI), and peaking in prevalence at 12MPI (~64%), just preceding the development of heterologous neutralization. Viral escape from the bNAb lineage was evident at V2 positions 160, 166, 167, 169 and 181 (HxB2 numbering), exhibiting several distinct escape pathways by 40MPI. Conclusions: Our PacBio full-length env sequencing method allowed unprecedented characterization of env dynamics and revealed an intra-clade superinfection that was not detected through conventional methods. The importance of superinfection in the development of this donor’s V1/V2-directed bNAb lineage is under investigation. Longitudinal full-length env deep sequencing allows accurate phylogenetic inference, provides a detailed picture of escape dynamics in epitope regions, and can identify minority variants, all of which may prove useful for understanding how env evolution can drive the development of antibody breadth.


June 1, 2021  |  

Cogent: Reconstructing the coding genome from full-length transcriptome sequences

For highly complex and large genomes, a well-annotated genome may be computationally challenging and costly, yet the study of alternative splicing events and gene annotations usually rely on the existence of a genome. Long-read sequencing technology provides new opportunities to sequence full-length cDNAs, avoiding computational challenges that short read transcript assembly brings. The use of single molecule, real-time sequencing from Pacific Biosciences to sequence transcriptomes (the Iso-SeqTM method), which produces de novo, high-quality, full-length transcripts, has revealed an astonishing amount of alternative splicing in eukaryotic species. With the Iso-Seq method, it is now possible to reconstruct the transcribed regions of the genome using just the transcripts themselves. We present Cogent, a tool for finding gene families and reconstructing the coding genome in the absence of a reference genome. Cogent uses k-mer similarities to first partition the transcripts into different gene families. Then, for each gene family, the transcripts are used to build a splice graph. Cogent identifies bubbles resulting from sequencing errors, minor variants, and exon skipping events, and attempts to resolve each splice graph down to the minimal set of reconstructed contigs. We apply Cogent to a Cuttlefish Iso-Seq dataset, for which there is a highly fragmented, Illumina-based draft genome assembly and little annotation. We show that Cogent successfully discovers gene families and can reconstruct the coding region of gene loci. The reconstructed contigs can then be used to visualize alternative splicing events, identify minor variants, and even be used to improve genome assemblies.


June 1, 2021  |  

Characterizing haplotype diversity at the immunoglobulin heavy chain locus across human populations using novel long-read sequencing and assembly approaches

The human immunoglobulin heavy chain locus (IGH) remains among the most understudied regions of the human genome. Recent efforts have shown that haplotype diversity within IGH is elevated and exhibits population specific patterns; for example, our re-sequencing of the locus from only a single chromosome uncovered >100 Kb of novel sequence, including descriptions of six novel alleles, and four previously unmapped genes. Historically, this complex locus architecture has hindered the characterization of IGH germline single nucleotide, copy number, and structural variants (SNVs; CNVs; SVs), and as a result, there remains little known about the role of IGH polymorphisms in inter-individual antibody repertoire variability and disease. To remedy this, we are taking a multi-faceted approach to improving existing genomic resources in the human IGH region. First, from whole-genome and fosmid-based datasets, we are building the largest and most ethnically diverse set of IGH reference assemblies to date, by employing PacBio long-read sequencing combined with novel algorithms for phased haplotype assembly. In total, our effort will result in the characterization of >15 phased haplotypes from individuals of Asian, African, and European descent, to be used as a representative reference set by the genomics and immunogenetics community. Second, we are utilizing this more comprehensive sequence catalogue to inform the design and analysis of novel targeted IGH genotyping assays. Standard targeted DNA enrichment methods (e.g., exome capture) are currently optimized for the capture of only very short (100’s of bp) DNA segments. Our platform uses a modified bench protocol to pair existing capture-array technologies with the enrichment of longer fragments of DNA, enabling the use of PacBio sequencing of DNA segments up to 7 Kb. This substantial increase in contiguity disambiguates many of the complex repeated structures inherent to the locus, while yielding the base pair fidelity required to call SNVs. Together these resources will establish a stronger framework for further characterizing IGH genetic diversity and facilitate IGH genomic profiling in the clinical and research settings, which will be key to fully understanding the role of IGH germline variation in antibody repertoire development and disease.


June 1, 2021  |  

Assessing diversity and clonal variation of Australia’s grapevine germplasm: Curating the FALCON-Unzip Chardonnay de novo genome assembly

Until recently only two genome assemblies were publicly available for grapevine—both Vitis vinifera L. Cv. Pinot Noir (PN). The best available PN genome assembly (Jaillon et al. 2007) is not representative of the genome complexity that is typical of wine-grape cultivars in the field and it is highly fragmented. To assess the genetic complexities of Chardonnay grapevine, assembly of a new de novo reference genome was needed. Here we describe a draft assembly using PacBio SMRT Sequencing data and PacBio’s new phased diploid genome assembler FALCON-Unzip (Chin et al. 2016).


April 21, 2020  |  

The replication-competent HIV-1 latent reservoir is primarily established near the time of therapy initiation.

Although antiretroviral therapy (ART) is highly effective at suppressing HIV-1 replication, the virus persists as a latent reservoir in resting CD4+ T cells during therapy. This reservoir forms even when ART is initiated early after infection, but the dynamics of its formation are largely unknown. The viral reservoirs of individuals who initiate ART during chronic infection are generally larger and genetically more diverse than those of individuals who initiate therapy during acute infection, consistent with the hypothesis that the reservoir is formed continuously throughout untreated infection. To determine when viruses enter the latent reservoir, we compared sequences of replication-competent viruses from resting peripheral CD4+ T cells from nine HIV-positive women on therapy to viral sequences circulating in blood collected longitudinally before therapy. We found that, on average, 71% of the unique viruses induced from the post-therapy latent reservoir were most genetically similar to viruses replicating just before ART initiation. This proportion is far greater than would be expected if the reservoir formed continuously and was always long lived. We conclude that ART alters the host environment in a way that allows the formation or stabilization of most of the long-lived latent HIV-1 reservoir, which points to new strategies targeted at limiting the formation of the reservoir around the time of therapy initiation.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020  |  

A Highly Unusual V1 Region of Env in an Elite Controller of HIV Infection.

HIV elite controllers represent a remarkable minority of patients who maintain normal CD4+ T-cell counts and low or undetectable viral loads for decades in the absence of antiretroviral therapy. To examine the possible contribution of virus attenuation to elite control, we obtained a primary HIV-1 isolate from an elite controller who had been infected for 19?years, the last 10 of which were in the absence of antiretroviral therapy. Full-length sequencing of this isolate revealed a highly unusual V1 domain in Envelope (Env). The V1 domain in this HIV-1 strain was 49 amino acids, placing it in the top 1% of lengths among the 6,112 Env sequences in the Los Alamos National Laboratory online database. Furthermore, it included two additional N-glycosylation sites and a pair of cysteines suggestive of an extra disulfide loop. Virus with this Env retained good infectivity and replicative capacity; however, analysis of recombinant viruses suggested that other sequences in Env were adapted to accommodate the unusual V1 domain. While the long V1 domain did not confer resistance to neutralization by monoclonal antibodies of the V1/V2-glycan-dependent class, it did confer resistance to neutralization by monoclonal antibodies of the V3-glycan-dependent class. Our findings support results in the literature that suggest a role for long V1 regions in shielding HIV-1 from recognition by V3-directed broadly neutralizing antibodies. In the case of the elite controller described here, it seems likely that selective pressures from the humoral immune system were responsible for driving the highly unusual polymorphisms present in this HIV-1 Envelope.IMPORTANCE Elite controllers have long provided an avenue for researchers to reveal mechanisms underlying control of HIV-1. While the role of host genetic factors in facilitating elite control is well known, the possibility of infection by attenuated strains of HIV-1 has been much less studied. Here we describe an unusual viral feature found in an elite controller of HIV-1 infection and demonstrate its role in conferring escape from monoclonal antibodies of the V3-glycan class. Our results suggest that extreme variation may be needed by HIV-1 to escape neutralization by some antibody specificities. Copyright © 2019 Silver et al.


April 21, 2020  |  

Long-read amplicon denoising.

Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Current advances in HIV vaccine preclinical studies using Macaque models.

The macaque simian or simian/human immunodeficiency virus (SIV/SHIV) challenge model has been widely used to inform and guide human vaccine trials. Substantial advances have been made recently in the application of repeated-low-dose challenge (RLD) approach to assess SIV/SHIV vaccine efficacies (VE). Some candidate HIV vaccines have shown protective effects in preclinical studies using the macaque SIV/SHIV model but the model’s true predictive value for screening potential HIV vaccine candidates needs to be evaluated further. Here, we review key parameters used in the RLD approach and discuss their relevance for evaluating VE to improve preclinical studies of candidate HIV vaccines.Crown Copyright © 2019. Published by Elsevier Ltd. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.