Background: HIV-1 proviruses in peripheral blood mononuclear cells (PBMCs) are felt to be an important reservoir of HIV-1 infection. Given that this pool represents an archival library, it can be used to study virus evolution and CD4+ T cell survival. Accurate study of this pool is burdened by difficulties encountered in sequencing a full-length proviral genome, typically accomplished by assembling overlapping pieces and imputing the full genome. Methodology: Cryopreserved PBMCs collected from a total of 8 HIV+ patients from 1997-2001 were used for genomic DNA extraction. Patients had been receiving cART for 2-8 years at the time samples were obtained. 7 patients had pVL >50 copies/mL (mean: 312,282, range: 18,372-683,400) and 1 had pVL <50. Genomic DNA was subjected to limiting dilution prior to amplification of near-full-length genomes by a newly developed nested PCR. The predicted size of the PCR product was 9.0 kb, spanning from the 5’ LTR through the 3’ LTR. Single molecules were sequenced as near-full-length amplicons directly from PCR products without shearing using commercially available P4-C2 reagents and standard protocols on a PacBio RS II instrument. Quality of the genomes was validated by clonal positive controls and synthetic mixtures. Results: Near-full-length provirus genome sequences were successfully obtained from all 8 patients as continuous long reads from single molecules. PacBio sequencing required approximately 10% of the PCR product needed for Sanger sequencing and generated 325 MB per 3-hour run including 1,800 full-length intact genome reads on average. One patient’s sample was not at a limiting dilution and analysis revealed multiple subspecies. For 8 near-fulllength provirus genomes derived from the other 7 patients, large internal deletions were noted in 2 proviruses; APOBEC-mediated hypermutations were seen in 2 proviruses; and 4 proviruses appeared to be intact genomes. All of the defective proviruses showed a complete absence of resistance mutations in either RT or protease, even after 2-8 years of cART. On the contrary, all of the intact proviruses contained evidence of ART-resistance associated mutations suggesting that they represented relatively recent variants. Conclusions: Combining a novel protocol for full-length limiting dilution amplification of proviruses with PacBio SMRT sequencing allowed for the generation of near-full-length genomes with good quality and an ability to detect minor variants at the 1-10% level. Preliminary data analyses suggest that defective proviruses may represent archival variants that persist long-term in host cells, while intact proviruses within the PBMC pool showing evidence of active virus replication may represent more recent variants.
Background: The HIV-1 proviral reservoir is incredibly stable, even while undergoing antiretroviral therapy, and is seen as the major barrier to HIV-1 eradication. Identifying and comprehensively characterizing this reservoir will be critical to achieving an HIV cure. Historically, this has been a tedious and labor intensive process, requiring high-replicate single-genome amplification reactions, or overlapping amplicons that are then reconstructed into full-length genomes by algorithmic imputation. Here, we present a deep sequencing and analysis method able to determine the exact identity and relative abundances of near-full-length HIV genomes from samples containing mixtures of genomes without shearing or complex bioinformatic reconstruction. Methods: We generated clonal near-full-length (~9 kb) amplicons derived from single genome amplification (SGA) of primary proviral isolates or PCR of well-documented control strains. These clonal products were mixed at various abundances and sequenced as near-full-length (~9 kb) amplicons without shearing. Each mixture yielded many near-full-length HIV-1 reads. Mathematical analysis techniques resolved the complex mixture of reads into estimates of distinct near-full-length viral genomes with their relative abundances. Results: Single Molecule, Real-Time (SMRT) Sequencing data contained near-full-length (~9 kb) continuous reads for each sample including some runs with greater than 10,000 near-full-length-genome reads in a three-hour sequencing run. Our methods correctly recapitulated exactly the originating genomes at a single-base resolution and their relative abundances in both mixtures of clonal controls and SGAs, and these results were validated using independent sequencing methods. Correct resolution was achieved even when genomes differed only by a single base. Minor abundances of 5% were reliably detected. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules, a rapid time-to-result. The single-molecule, full-length nature of this sequencing method allows us to estimate variant subspecies and relative abundances with single-nucleotide resolution. This method allows for reference-agnostic and cost-effective full-genome sequencing of HIV-1, which could both further our understanding of latent infection and develop novel and improved tools for quantifying HIV provirus, which will be critical to cure HIV.
An improved circular consensus algorithm with an application to detect HIV-1 Drug Resistance Associated Mutations (DRAMs)
Scientists who require confident resolution of heterogeneous populations across complex regions have been unable to transition to short-read sequencing methods. They continue to depend on Sanger sequencing despite its cost and time inefficiencies. Here we present a new redesigned algorithm that allows the generation of circular consensus sequences (CCS) from individual SMRT Sequencing reads. With this new algorithm, dubbed CCS2, it is possible to reach high quality across longer insert lengths at a lower cost and higher throughput than Sanger sequencing. We applied CCS2 to the characterization of the HIV-1 K103N drug-resistance associated mutation in both clonal and patient samples. This particular DRAM has previously proved to be clinically relevant, but challenging to characterize due to regional sequence context. First, a mutation was introduced into the 3rd position of amino acid position 103 (A>C substitution) of the RT gene on a pNL4-3 backbone by site-directed mutagenesis. Regions spanning ~1.3 kb were PCR amplified from both the non-mutated and mutant (K103N) plasmids, and were sequenced individually and as a 50:50 mixture. Additionally, the proviral reservoir of a subject with known dates of virologic failure of an Efavirenz-based regimen and with documented emergence of drug resistant (K103N) viremia was sequenced at several time points as a proof-of-concept study to determine the kinetics of retention and decay of K103N.Sequencing data were analyzed using the new CCS2 algorithm, which uses a fully-generative probabilistic model of our SMRT Sequencing process to polish consensus sequences to high accuracy. With CCS2, we are able to achieve a per-read empirical quality of QV30 (99.9% accuracy) at 19X coverage. A total of ~5000 1.3 kb consensus sequences with a collective empirical quality of ~QV40 (99.99%) were obtained for each sample. We demonstrate a 0% miscall rate in both unmixed control samples, and estimate a 48:52 frequency for the K103N mutation in the mixed (50:50) plasmid sample, consistent with data produced by orthogonal platforms. Additionally, the K103N escape variant was only detected in proviral samples from time points subsequent (19%) to the emergence of drug resistant viremia. This tool might be used to monitor the HIV reservoir for stable evolutionary changes throughout infection.
PacBio Sequencing is characterized by very long sequence reads (averaging > 10,000 bases), lack of GC-bias, and high consensus accuracy. These features have allowed the method to provide a new…
In this Labroots webinar, Meredith Ashby, Director of Microbial Genomics at PacBio, describes the utility of highly accurate long-read sequencing, known as HiFi sequencing, to understand the SARs-CoV-2 viral genome….
Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
The replication-competent HIV-1 latent reservoir is primarily established near the time of therapy initiation.
Although antiretroviral therapy (ART) is highly effective at suppressing HIV-1 replication, the virus persists as a latent reservoir in resting CD4+ T cells during therapy. This reservoir forms even when ART is initiated early after infection, but the dynamics of its formation are largely unknown. The viral reservoirs of individuals who initiate ART during chronic infection are generally larger and genetically more diverse than those of individuals who initiate therapy during acute infection, consistent with the hypothesis that the reservoir is formed continuously throughout untreated infection. To determine when viruses enter the latent reservoir, we compared sequences of replication-competent viruses from resting peripheral CD4+ T cells from nine HIV-positive women on therapy to viral sequences circulating in blood collected longitudinally before therapy. We found that, on average, 71% of the unique viruses induced from the post-therapy latent reservoir were most genetically similar to viruses replicating just before ART initiation. This proportion is far greater than would be expected if the reservoir formed continuously and was always long lived. We conclude that ART alters the host environment in a way that allows the formation or stabilization of most of the long-lived latent HIV-1 reservoir, which points to new strategies targeted at limiting the formation of the reservoir around the time of therapy initiation.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
The macaque simian or simian/human immunodeficiency virus (SIV/SHIV) challenge model has been widely used to inform and guide human vaccine trials. Substantial advances have been made recently in the application of repeated-low-dose challenge (RLD) approach to assess SIV/SHIV vaccine efficacies (VE). Some candidate HIV vaccines have shown protective effects in preclinical studies using the macaque SIV/SHIV model but the model’s true predictive value for screening potential HIV vaccine candidates needs to be evaluated further. Here, we review key parameters used in the RLD approach and discuss their relevance for evaluating VE to improve preclinical studies of candidate HIV vaccines.Crown Copyright © 2019. Published by Elsevier Ltd. All rights reserved.
Vertebrate genomes contain a record of retroviruses that invaded the germlines of ancestral hosts and are passed to offspring as endogenous retroviruses (ERVs). ERVs can impact host function since they contain the necessary sequences for expression within the host. Dogs are an important system for the study of disease and evolution, yet no substantiated reports of infectious retroviruses in dogs exist. Here, we utilized Illumina whole genome sequence data to assess the origin and evolution of a recently active gammaretroviral lineage in domestic and wild canids.We identified numerous recently integrated loci of a canid-specific ERV-Fc sublineage within Canis, including 58 insertions that were absent from the reference assembly. Insertions were found throughout the dog genome including within and near gene models. By comparison of orthologous occupied sites, we characterized element prevalence across 332 genomes including all nine extant canid species, revealing evolutionary patterns of ERV-Fc segregation among species as well as subpopulations.Sequence analysis revealed common disruptive mutations, suggesting a predominant form of ERV-Fc spread by trans complementation of defective proviruses. ERV-Fc activity included multiple circulating variants that infected canid ancestors from the last 20 million to within 1.6 million years, with recent bursts of germline invasion in the sublineage leading to wolves and dogs.
Codon swapping of zinc finger nucleases confers expression in primary cells and in vivo from a single lentiviral vector.
Zinc finger nucleases (ZFNs) are promising tools for genome editing for biotechnological as well as therapeutic purposes. Delivery remains a major issue impeding targeted genome modification. Lentiviral vectors are highly efficient for delivering transgenes into cell lines, primary cells and into organs, such as the liver. However, the reverse transcription of lentiviral vectors leads to recombination of homologous sequences, as found between and within ZFN monomers.We used a codon swapping strategy to both drastically disrupt sequence identity between ZFN monomers and to reduce sequence repeats within a monomer sequence. We constructed lentiviral vectors encoding codon-swapped ZFNs or unmodified ZFNs from a single mRNA transcript. Cell lines, primary hepatocytes and newborn rats were used to evaluate the efficacy of integrative-competent (ICLV) and integrative-deficient (IDLV) lentiviral vectors to deliver ZFNs into target cells.We reduced total identity between ZFN monomers from 90.9% to 61.4% and showed that a single ICLV allowed efficient expression of functional ZFNs targeting the rat UGT1A1 gene after codon-swapping, leading to much higher ZFN activity in cell lines (up to 7-fold increase compared to unmodified ZFNs and 60% activity in C6 cells), as compared to plasmid transfection or a single ICLV encoding unmodified ZFN monomers. Off-target analysis located several active sites for the 5-finger UGT1A1-ZFNs. Furthermore, we reported for the first time successful ZFN-induced targeted DNA double-strand breaks in primary cells (hepatocytes) and in vivo (liver) after delivery of a single IDLV encoding two ZFNs.These results demonstrate that a codon-swapping approach allowed a single lentiviral vector to efficiently express ZFNs and should stimulate the use of this viral platform for ZFN-mediated genome editing of primary cells, for both ex vivo or in vivo applications.
One of the most crucial steps in the life cycle of a retrovirus is the integration of the viral DNA (vDNA) copy of the RNA genome into the genome of an infected host cell. Integration provides for efficient viral gene expression as well as for the segregation of viral genomes to daughter cells upon cell division. Some integrated viruses are not well expressed, and cells latently infected with human immunodeficiency virus type 1 (HIV-1) can resist the action of potent antiretroviral drugs and remain dormant for decades. Intensive research has been dedicated to understanding the catalytic mechanism of integration, as well as the viral and cellular determinants that influence integration site distribution throughout the host genome. In this review, we summarize the evolution of techniques that have been used to recover and map retroviral integration sites, from the early days that first indicated that integration could occur in multiple cellular DNA locations, to current technologies that map upwards of millions of unique integration sites from single in vitro integration reactions or cell culture infections. We further review important insights gained from the use of such mapping techniques, including the monitoring of cell clonal expansion in patients treated with retrovirus-based gene therapy vectors, or patients with acquired immune deficiency syndrome (AIDS) on suppressive antiretroviral therapy (ART). These insights span from integrase (IN) enzyme sequence preferences within target DNA (tDNA) at the sites of integration, to the roles of host cellular proteins in mediating global integration distribution, to the potential relationship between genomic location of vDNA integration site and retroviral latency.
Short hairpin (sh)RNAs delivered by recombinant adeno-associated viruses (rAAVs) are valuable tools to study gene function in vivo and a promising gene therapy platform. Our data show that incorporation of shRNA transgenes into rAAV constructs reduces vector yield and produces a population of truncated and defective genomes. We demonstrate that sequences with hairpins or hairpin-like structures drive the generation of truncated AAV genomes through a polymerase redirection mechanism during viral genome replication. Our findings reveal the importance of genomic secondary structure when optimizing viral vector designs. We also discovered that shDNAs could be adapted to act as surrogate mutant inverted terminal repeats (mTRs), sequences that were previously thought to be required for functional self-complementary AAV vectors. The use of shDNAs as artificial mTRs opens the door to engineering a new generation of AAV vectors with improved potency, genetic stability, and safety for both preclinical studies and human gene therapy. Published by Elsevier Inc.
Viruses of the subfamily Orthoretrovirinaeare defined by the ability to reverse transcribe an RNA genome into DNA that integrates into the host cell genome during the intracellular virus life cycle. Exogenous retroviruses (XRVs) are horizontally transmitted between host individuals, with disease outcome depending on interactions between the retrovirus and the host organism. When retroviruses infect germ line cells of the host, they may become endogenous retroviruses (ERVs), which are permanent elements in the host germ line that are subject to vertical transmission. These ERVs sometimes remain infectious and can themselves give rise to XRVs. This review integrates recent developments in the phylogenetic classification of retroviruses and the identification of retroviral receptors to elucidate the origins and evolution of XRVs and ERVs. We consider whether ERVs may recurrently pressure XRVs to shift receptor usage to sidestep ERV interference. We discuss how related retroviruses undergo alternative fates in different host lineages after endogenization, with koala retrovirus (KoRV) receiving notable interest as a recent invader of its host germ line. KoRV is heritable but also infectious, which provides insights into the early stages of germ line invasions as well as XRV generation from ERVs. The relationship of KoRV to primate and other retroviruses is placed in the context of host biogeography and the potential role of bats and rodents as vectors for interspecies viral transmission. Combining studies of extant XRVs and “fossil” endogenous retroviruses in koalas and other Australasian species has broadened our understanding of the evolution of retroviruses and host-retrovirus interactions. Copyright © 2017 American Society for Microbiology.
HIV-1 infection of primary CD4(+) T cells regulates the expression of specific HERV-K (HML-2) elements.
Endogenous retroviruses (ERVs) occupy extensive regions of the human genome. Although many of these retroviral elements have lost their ability to replicate, those whose insertion took place more recently, such as the HML-2 group of HERV-K elements, still retain intact open reading frames and the capacity to produce certain viral RNA and/or proteins. Transcription of these ERVs is, however, tightly regulated by dedicated epigenetic control mechanisms. Nonetheless, it has been reported that some pathologic states, such as viral infections and certain cancers, coincide with ERV expression suggesting transcriptional reawakening is possible. HML-2 elements are reportedly induced during HIV-1 infection, but the conserved nature of these elements has, until recently, rendered their expression profiling problematic.Here, we provide comprehensive HERV-K HML-2 expression profiles specific for productively HIV-1 infected primary human CD4(+) T cells. We combined enrichment of HIV-1 infected cells using a reporter virus expressing a surface reporter for gentle and efficient purification with long-read Single Molecule Real-Time sequencing. We show that three HML-2 proviruses, 6q25.1, 8q24.3, and 19q13.42 are up-regulated on average between 3- and 5-fold in HIV-1 infected CD4(+) T cells. One provirus, HML-2 12q24.33, in contrast, was repressed in the presence of active HIV replication.In conclusion, this report identifies the HERV-K HML-2 loci whose expression profiles differ upon HIV-1 infection in primary human CD4(+) T cells. These data will help pave the way for further studies on the influence of endogenous retroviruses on HIV-1 replication.Importance Endogenous retroviruses inhabit big portions of our genome. And although they are mainly inert some of the evolutionarily younger members maintain the ability to express both RNA as well as proteins. We have developed an approach using long-read SMRT sequencing that produces long reads, that provides us with ability to obtain detailed and accurate HERV-K HML-2 expression profiles. We have now applied this approach to study HERV-K expression in the presence and absence of productive HIV-1 infection of primary human CD4(+) T cells. In addition to using SMRT sequencing, our strategy also includes the magnetic selection of the infected cells so that levels of background expression due to uninfected cells are kept at a minimum. The results in this manuscript provide the blueprint for in-depth studies of the interactions of the authentic upregulated HERV-K HML-2 elements and HIV-1. Copyright © 2017 American Society for Microbiology.
Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing.
Alternative RNA splicing greatly expands the repertoire of proteins encoded by genomes. Next-generation sequencing (NGS) is attractive for studying alternative splicing because of the efficiency and low cost per base, but short reads typical of NGS only report mRNA fragments containing one or few splice junctions. Here, we used single-molecule amplification and long-read sequencing to study the HIV-1 provirus, which is only 9700 bp in length, but encodes nine major proteins via alternative splicing. Our data showed that the clinical isolate HIV-1(89.6) produces at least 109 different spliced RNAs, including a previously unappreciated ~1 kb class of messages, two of which encode new proteins. HIV-1 message populations differed between cell types, longitudinally during infection, and among T cells from different human donors. These findings open a new window on a little studied aspect of HIV-1 replication, suggest therapeutic opportunities and provide advanced tools for the study of alternative splicing.