Epstein-Barr virus (EBV) is a ubiquitous pathogen of humans that can cause several types of lymphoma and carcinoma. Like other herpesviruses, EBV has diversified both through co-evolution with its host, and genetic exchange between virus strains. Sequence analysis of the EBV genome is unusually challenging, because of the large number and length of repeat regions within the virus. Here we describe the sequence assembly and analysis of the large internal repeat of EBV (IR1 or BamW repeats) from over 70 strains.Diversity of the latency protein EBNA-LP resides predominantly within the exons downstream of IR1. The integrity of the putative BWRF1 ORF is retained in over 80% of strains, and deletions truncating IR1 always spare BWRF1. Conserved regions include the IR1 latency promoter (Wp), and one zone upstream of and two within BWRF1.IR1 is heterogeneous in 70% of strains, and this heterogeneity arises from sequence exchange between strains as well as spontaneous mutation, with inter-strain recombination more common in tumour-derived viruses. This genetic exchange often incorporates regions of <1kb, and allelic gene conversion changes the frequency of small regions within the repeat, but not close to the flanks. These observations suggest that IR1 - and by extension EBV - diversifies through both recombination and breakpoint repair, while concerted evolution of IR1 is driven by gene conversion of small regions. Finally, the prototype EBV strain B95-8 contains four non-consensus variants within a single IR1 repeat unit, including a STOP codon in EBNA-LP. Repairing IR1 improves EBNA-LP levels and the quality of transformation by the B95-8 BAC.IMPORTANCE Epstein-Barr virus (EBV) infects the majority of the world population, but only causes illness in a small minority. Nevertheless, over 1% of cancers worldwide are attributable to EBV. Recent sequencing projects investigating virus diversity, to see if different strains have different disease impacts, have excluded regions of repeating sequence, as they are more technically challenging. Here we analyse the sequence of the largest repeat in EBV (IR1). We first characterised the variations in protein sequences encoded across IR1. In studying variations within the repeat of each strain, we identified a mutation in the main laboratory strain of EBV that impairs virus function, and suggest that tumour-associated viruses may be more likely to contain DNA mixed from two strains. Patterns of this mixing suggest that sequences can spread between strains (and also within the repeat) by copying sequence from another strain (or repeat unit) to repair DNA damage. Copyright © 2017 Ba abdullah et al.
Journal: Journal of virology