Background: HIV-1 proviruses in peripheral blood mononuclear cells (PBMCs) are felt to be an important reservoir of HIV-1 infection. Given that this pool represents an archival library, it can be used to study virus evolution and CD4+ T cell survival. Accurate study of this pool is burdened by difficulties encountered in sequencing a full-length proviral genome, typically accomplished by assembling overlapping pieces and imputing the full genome. Methodology: Cryopreserved PBMCs collected from a total of 8 HIV+ patients from 1997-2001 were used for genomic DNA extraction. Patients had been receiving cART for 2-8 years at the time samples were obtained. 7 patients had pVL >50 copies/mL (mean: 312,282, range: 18,372-683,400) and 1 had pVL <50. Genomic DNA was subjected to limiting dilution prior to amplification of near-full-length genomes by a newly developed nested PCR. The predicted size of the PCR product was 9.0 kb, spanning from the 5’ LTR through the 3’ LTR. Single molecules were sequenced as near-full-length amplicons directly from PCR products without shearing using commercially available P4-C2 reagents and standard protocols on a PacBio RS II instrument. Quality of the genomes was validated by clonal positive controls and synthetic mixtures. Results: Near-full-length provirus genome sequences were successfully obtained from all 8 patients as continuous long reads from single molecules. PacBio sequencing required approximately 10% of the PCR product needed for Sanger sequencing and generated 325 MB per 3-hour run including 1,800 full-length intact genome reads on average. One patient’s sample was not at a limiting dilution and analysis revealed multiple subspecies. For 8 near-fulllength provirus genomes derived from the other 7 patients, large internal deletions were noted in 2 proviruses; APOBEC-mediated hypermutations were seen in 2 proviruses; and 4 proviruses appeared to be intact genomes. All of the defective proviruses showed a complete absence of resistance mutations in either RT or protease, even after 2-8 years of cART. On the contrary, all of the intact proviruses contained evidence of ART-resistance associated mutations suggesting that they represented relatively recent variants. Conclusions: Combining a novel protocol for full-length limiting dilution amplification of proviruses with PacBio SMRT sequencing allowed for the generation of near-full-length genomes with good quality and an ability to detect minor variants at the 1-10% level. Preliminary data analyses suggest that defective proviruses may represent archival variants that persist long-term in host cells, while intact proviruses within the PBMC pool showing evidence of active virus replication may represent more recent variants.
Background: The HIV-1 proviral reservoir is incredibly stable, even while undergoing antiretroviral therapy, and is seen as the major barrier to HIV-1 eradication. Identifying and comprehensively characterizing this reservoir will be critical to achieving an HIV cure. Historically, this has been a tedious and labor intensive process, requiring high-replicate single-genome amplification reactions, or overlapping amplicons that are then reconstructed into full-length genomes by algorithmic imputation. Here, we present a deep sequencing and analysis method able to determine the exact identity and relative abundances of near-full-length HIV genomes from samples containing mixtures of genomes without shearing or complex bioinformatic reconstruction. Methods: We generated clonal near-full-length (~9 kb) amplicons derived from single genome amplification (SGA) of primary proviral isolates or PCR of well-documented control strains. These clonal products were mixed at various abundances and sequenced as near-full-length (~9 kb) amplicons without shearing. Each mixture yielded many near-full-length HIV-1 reads. Mathematical analysis techniques resolved the complex mixture of reads into estimates of distinct near-full-length viral genomes with their relative abundances. Results: Single Molecule, Real-Time (SMRT) Sequencing data contained near-full-length (~9 kb) continuous reads for each sample including some runs with greater than 10,000 near-full-length-genome reads in a three-hour sequencing run. Our methods correctly recapitulated exactly the originating genomes at a single-base resolution and their relative abundances in both mixtures of clonal controls and SGAs, and these results were validated using independent sequencing methods. Correct resolution was achieved even when genomes differed only by a single base. Minor abundances of 5% were reliably detected. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules, a rapid time-to-result. The single-molecule, full-length nature of this sequencing method allows us to estimate variant subspecies and relative abundances with single-nucleotide resolution. This method allows for reference-agnostic and cost-effective full-genome sequencing of HIV-1, which could both further our understanding of latent infection and develop novel and improved tools for quantifying HIV provirus, which will be critical to cure HIV.
Endogenous pararetroviruses (EPRVs) are characterized in several plant genomes and their biological effects have been reported. In this study, hundreds of EPRV segments were identified in six Citrinae genomes. A total of 1034 EPRV segments were identified in the genomes of sweet orange, 2036 in pummelo, 598 in clementine mandarin, 752 in Ichang papeda, 2060 in citron and 245 in atalantia. Genomic analysis indicated that EPRV segments tend to cluster as hot spots in the genomes, particularly on chromosome 2 and 5. Large numbers of simple repeats and transposable elements were identified in the 2-kb flanking regions of the EPRV segments. Comparative genomic analysis and PCR experiments showed that there are highly conserved EPRV segments and species-specific EPRV segments between the Citrinae genomes. Phylogenetic analysis suggested that the integration events of EPRVs could initiate in a common progenitor of Citrinae species and repeatedly occur during the Citrinae divergence.Copyright © 2018 Elsevier B.V. All rights reserved.
The macaque simian or simian/human immunodeficiency virus (SIV/SHIV) challenge model has been widely used to inform and guide human vaccine trials. Substantial advances have been made recently in the application of repeated-low-dose challenge (RLD) approach to assess SIV/SHIV vaccine efficacies (VE). Some candidate HIV vaccines have shown protective effects in preclinical studies using the macaque SIV/SHIV model but the model’s true predictive value for screening potential HIV vaccine candidates needs to be evaluated further. Here, we review key parameters used in the RLD approach and discuss their relevance for evaluating VE to improve preclinical studies of candidate HIV vaccines.Crown Copyright © 2019. Published by Elsevier Ltd. All rights reserved.
Geminiviruses cause damaging diseases in several important crop species. However, limited progress has been made in developing crop varieties resistant to these highly diverse DNA viruses. Recently, the bacterial CRISPR/Cas9 system has been transferred to plants to target and confer immunity to geminiviruses. In this study, we use CRISPR-Cas9 interference in the staple food crop cassava with the aim of engineering resistance to African cassava mosaic virus, a member of a widespread and important family (Geminiviridae) of plant-pathogenic DNA viruses.Our results show that the CRISPR system fails to confer effective resistance to the virus during glasshouse inoculations. Further, we find that between 33 and 48% of edited virus genomes evolve a conserved single-nucleotide mutation that confers resistance to CRISPR-Cas9 cleavage. We also find that in the model plant Nicotiana benthamiana the replication of the novel, mutant virus is dependent on the presence of the wild-type virus.Our study highlights the risks associated with CRISPR-Cas9 virus immunity in eukaryotes given that the mutagenic nature of the system generates viral escapes in a short time period. Our in-depth analysis of virus populations also represents a template for future studies analyzing virus escape from anti-viral CRISPR transgenics. This is especially important for informing regulation of such actively mutagenic applications of CRISPR-Cas9 technology in agriculture.
Vertebrate genomes contain a record of retroviruses that invaded the germlines of ancestral hosts and are passed to offspring as endogenous retroviruses (ERVs). ERVs can impact host function since they contain the necessary sequences for expression within the host. Dogs are an important system for the study of disease and evolution, yet no substantiated reports of infectious retroviruses in dogs exist. Here, we utilized Illumina whole genome sequence data to assess the origin and evolution of a recently active gammaretroviral lineage in domestic and wild canids.We identified numerous recently integrated loci of a canid-specific ERV-Fc sublineage within Canis, including 58 insertions that were absent from the reference assembly. Insertions were found throughout the dog genome including within and near gene models. By comparison of orthologous occupied sites, we characterized element prevalence across 332 genomes including all nine extant canid species, revealing evolutionary patterns of ERV-Fc segregation among species as well as subpopulations.Sequence analysis revealed common disruptive mutations, suggesting a predominant form of ERV-Fc spread by trans complementation of defective proviruses. ERV-Fc activity included multiple circulating variants that infected canid ancestors from the last 20 million to within 1.6 million years, with recent bursts of germline invasion in the sublineage leading to wolves and dogs.
AAV-mediated delivery of zinc finger nucleases targeting hepatitis B virus inhibits active replication.
Despite an existing effective vaccine, hepatitis B virus (HBV) remains a major public health concern. There are effective suppressive therapies for HBV, but they remain expensive and inaccessible to many, and not all patients respond well. Furthermore, HBV can persist as genomic covalently closed circular DNA (cccDNA) that remains in hepatocytes even during otherwise effective therapy and facilitates rebound in patients after treatment has stopped. Therefore, the need for an effective treatment that targets active and persistent HBV infections remains. As a novel approach to treat HBV, we have targeted the HBV genome for disruption to prevent viral reactivation and replication. We generated 3 zinc finger nucleases (ZFNs) that target sequences within the HBV polymerase, core and X genes. Upon the formation of ZFN-induced DNA double strand breaks (DSB), imprecise repair by non-homologous end joining leads to mutations that inactivate HBV genes. We delivered HBV-specific ZFNs using self-complementary adeno-associated virus (scAAV) vectors and tested their anti-HBV activity in HepAD38 cells. HBV-ZFNs efficiently disrupted HBV target sites by inducing site-specific mutations. Cytotoxicity was seen with one of the ZFNs. scAAV-mediated delivery of a ZFN targeting HBV polymerase resulted in complete inhibition of HBV DNA replication and production of infectious HBV virions in HepAD38 cells. This effect was sustained for at least 2 weeks following only a single treatment. Furthermore, high specificity was observed for all ZFNs, as negligible off-target cleavage was seen via high-throughput sequencing of 7 closely matched potential off-target sites. These results show that HBV-targeted ZFNs can efficiently inhibit active HBV replication and suppress the cellular template for HBV persistence, making them promising candidates for eradication therapy.
One of the most crucial steps in the life cycle of a retrovirus is the integration of the viral DNA (vDNA) copy of the RNA genome into the genome of an infected host cell. Integration provides for efficient viral gene expression as well as for the segregation of viral genomes to daughter cells upon cell division. Some integrated viruses are not well expressed, and cells latently infected with human immunodeficiency virus type 1 (HIV-1) can resist the action of potent antiretroviral drugs and remain dormant for decades. Intensive research has been dedicated to understanding the catalytic mechanism of integration, as well as the viral and cellular determinants that influence integration site distribution throughout the host genome. In this review, we summarize the evolution of techniques that have been used to recover and map retroviral integration sites, from the early days that first indicated that integration could occur in multiple cellular DNA locations, to current technologies that map upwards of millions of unique integration sites from single in vitro integration reactions or cell culture infections. We further review important insights gained from the use of such mapping techniques, including the monitoring of cell clonal expansion in patients treated with retrovirus-based gene therapy vectors, or patients with acquired immune deficiency syndrome (AIDS) on suppressive antiretroviral therapy (ART). These insights span from integrase (IN) enzyme sequence preferences within target DNA (tDNA) at the sites of integration, to the roles of host cellular proteins in mediating global integration distribution, to the potential relationship between genomic location of vDNA integration site and retroviral latency.
Viruses of the subfamily Orthoretrovirinaeare defined by the ability to reverse transcribe an RNA genome into DNA that integrates into the host cell genome during the intracellular virus life cycle. Exogenous retroviruses (XRVs) are horizontally transmitted between host individuals, with disease outcome depending on interactions between the retrovirus and the host organism. When retroviruses infect germ line cells of the host, they may become endogenous retroviruses (ERVs), which are permanent elements in the host germ line that are subject to vertical transmission. These ERVs sometimes remain infectious and can themselves give rise to XRVs. This review integrates recent developments in the phylogenetic classification of retroviruses and the identification of retroviral receptors to elucidate the origins and evolution of XRVs and ERVs. We consider whether ERVs may recurrently pressure XRVs to shift receptor usage to sidestep ERV interference. We discuss how related retroviruses undergo alternative fates in different host lineages after endogenization, with koala retrovirus (KoRV) receiving notable interest as a recent invader of its host germ line. KoRV is heritable but also infectious, which provides insights into the early stages of germ line invasions as well as XRV generation from ERVs. The relationship of KoRV to primate and other retroviruses is placed in the context of host biogeography and the potential role of bats and rodents as vectors for interspecies viral transmission. Combining studies of extant XRVs and “fossil” endogenous retroviruses in koalas and other Australasian species has broadened our understanding of the evolution of retroviruses and host-retrovirus interactions. Copyright © 2017 American Society for Microbiology.
Here, we present the complete genome sequence of a porcine endogenous retrovirus determined by Pacific Biosciences sequencing. A comparison of the genome of this isolate with those of other strains revealed the operation of a mechanism resulting in the selective accumulation of G and C bases in the viral DNA. Copyright © 2017 Szucs et al.
HIV-1 infection of primary CD4(+) T cells regulates the expression of specific HERV-K (HML-2) elements.
Endogenous retroviruses (ERVs) occupy extensive regions of the human genome. Although many of these retroviral elements have lost their ability to replicate, those whose insertion took place more recently, such as the HML-2 group of HERV-K elements, still retain intact open reading frames and the capacity to produce certain viral RNA and/or proteins. Transcription of these ERVs is, however, tightly regulated by dedicated epigenetic control mechanisms. Nonetheless, it has been reported that some pathologic states, such as viral infections and certain cancers, coincide with ERV expression suggesting transcriptional reawakening is possible. HML-2 elements are reportedly induced during HIV-1 infection, but the conserved nature of these elements has, until recently, rendered their expression profiling problematic.Here, we provide comprehensive HERV-K HML-2 expression profiles specific for productively HIV-1 infected primary human CD4(+) T cells. We combined enrichment of HIV-1 infected cells using a reporter virus expressing a surface reporter for gentle and efficient purification with long-read Single Molecule Real-Time sequencing. We show that three HML-2 proviruses, 6q25.1, 8q24.3, and 19q13.42 are up-regulated on average between 3- and 5-fold in HIV-1 infected CD4(+) T cells. One provirus, HML-2 12q24.33, in contrast, was repressed in the presence of active HIV replication.In conclusion, this report identifies the HERV-K HML-2 loci whose expression profiles differ upon HIV-1 infection in primary human CD4(+) T cells. These data will help pave the way for further studies on the influence of endogenous retroviruses on HIV-1 replication.Importance Endogenous retroviruses inhabit big portions of our genome. And although they are mainly inert some of the evolutionarily younger members maintain the ability to express both RNA as well as proteins. We have developed an approach using long-read SMRT sequencing that produces long reads, that provides us with ability to obtain detailed and accurate HERV-K HML-2 expression profiles. We have now applied this approach to study HERV-K expression in the presence and absence of productive HIV-1 infection of primary human CD4(+) T cells. In addition to using SMRT sequencing, our strategy also includes the magnetic selection of the infected cells so that levels of background expression due to uninfected cells are kept at a minimum. The results in this manuscript provide the blueprint for in-depth studies of the interactions of the authentic upregulated HERV-K HML-2 elements and HIV-1. Copyright © 2017 American Society for Microbiology.
Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing.
Alternative RNA splicing greatly expands the repertoire of proteins encoded by genomes. Next-generation sequencing (NGS) is attractive for studying alternative splicing because of the efficiency and low cost per base, but short reads typical of NGS only report mRNA fragments containing one or few splice junctions. Here, we used single-molecule amplification and long-read sequencing to study the HIV-1 provirus, which is only 9700 bp in length, but encodes nine major proteins via alternative splicing. Our data showed that the clinical isolate HIV-1(89.6) produces at least 109 different spliced RNAs, including a previously unappreciated ~1 kb class of messages, two of which encode new proteins. HIV-1 message populations differed between cell types, longitudinally during infection, and among T cells from different human donors. These findings open a new window on a little studied aspect of HIV-1 replication, suggest therapeutic opportunities and provide advanced tools for the study of alternative splicing.
Gene activity in primary T cells infected with HIV89.6: intron retention and induction of genomic repeats.
HIV infection has been reported to alter cellular gene activity, but published studies have commonly assayed transformed cell lines and lab-adapted HIV strains, yielding inconsistent results. Here we carried out a deep RNA-Seq analysis of primary human T cells infected with the low passage HIV isolate HIV89.6.Seventeen percent of cellular genes showed altered activity 48 h after infection. In a meta-analysis including four other studies, our data differed from studies of HIV infection in cell lines but showed more parallels with infections of primary cells. We found a global trend toward retention of introns after infection, suggestive of a novel cellular response to infection. HIV89.6 infection was also associated with activation of several human endogenous retroviruses (HERVs) and retrotransposons, of interest as possible novel antigens that could serve as vaccine targets. The most highly activated group of HERVs was a subset of the ERV-9. Analysis showed that activation was associated with a particular variant of ERV-9 long terminal repeats that contains an indel near the U3-R border. These data also allowed quantification of >70 splice forms of the HIV89.6 RNA and specified the main types of chimeric HIV89.6-host RNAs. Comparison to over 100,000 integration site sequences from the same infected cell populations allowed quantification of authentic versus artifactual chimeric reads, showing that 5′ read-in, splicing out of HIV89.6 from the D4 donor and 3′ read-through were the most common HIV89.6-host cell chimeric RNA forms.Analysis of RNA abundance after infection of primary T cells with the low passage HIV89.6 isolate disclosed multiple novel features of HIV-host interactions, notably intron retention and induction of transcription of retrotransposons and endogenous retroviruses.
Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics.
Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio’s single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing.© The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro-insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and diseases of the human brain.