Pacific Biosciences’ SMRT sequencing method was used to extend the sequence of HLA-A*02:13. © 2019 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
HLA*LA implements a new graph alignment model for HLA type inference, based on the projection of linear alignments onto a variation graph. It enables accurate HLA type inference from whole-genome (99% accuracy) and whole-exome (93% accuracy) Illumina data; from long-read Oxford Nanopore and Pacific Biosciences data (98% accuracy for whole-genome and targeted data); and from genome assemblies. Computational requirements for a typical sample vary between 0.7 and 14 CPU hours per sample.HLA*LA is implemented in C?++ and Perl and freely available as a bioconda package or from https://github.com/DiltheyLab/HLA-LA (GPL v3).Supplementary data are available online. © The Author(s) 2019. Published by Oxford University Press.
HIV elite controllers represent a remarkable minority of patients who maintain normal CD4+ T-cell counts and low or undetectable viral loads for decades in the absence of antiretroviral therapy. To examine the possible contribution of virus attenuation to elite control, we obtained a primary HIV-1 isolate from an elite controller who had been infected for 19?years, the last 10 of which were in the absence of antiretroviral therapy. Full-length sequencing of this isolate revealed a highly unusual V1 domain in Envelope (Env). The V1 domain in this HIV-1 strain was 49 amino acids, placing it in the top 1% of lengths among the 6,112 Env sequences in the Los Alamos National Laboratory online database. Furthermore, it included two additional N-glycosylation sites and a pair of cysteines suggestive of an extra disulfide loop. Virus with this Env retained good infectivity and replicative capacity; however, analysis of recombinant viruses suggested that other sequences in Env were adapted to accommodate the unusual V1 domain. While the long V1 domain did not confer resistance to neutralization by monoclonal antibodies of the V1/V2-glycan-dependent class, it did confer resistance to neutralization by monoclonal antibodies of the V3-glycan-dependent class. Our findings support results in the literature that suggest a role for long V1 regions in shielding HIV-1 from recognition by V3-directed broadly neutralizing antibodies. In the case of the elite controller described here, it seems likely that selective pressures from the humoral immune system were responsible for driving the highly unusual polymorphisms present in this HIV-1 Envelope.IMPORTANCE Elite controllers have long provided an avenue for researchers to reveal mechanisms underlying control of HIV-1. While the role of host genetic factors in facilitating elite control is well known, the possibility of infection by attenuated strains of HIV-1 has been much less studied. Here we describe an unusual viral feature found in an elite controller of HIV-1 infection and demonstrate its role in conferring escape from monoclonal antibodies of the V3-glycan class. Our results suggest that extreme variation may be needed by HIV-1 to escape neutralization by some antibody specificities. Copyright © 2019 Silver et al.
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity. Copyright © 2018 Elsevier Inc. All rights reserved.
Next generation sequencing characterizes HLA diversity in a registry population from the Netherlands.
Next generation DNA sequencing is used to determine the HLA-A, -B, -C, -DRB1, -DRB3/4/5, and -DQB1 assignments of 1009 unrelated volunteers for the unrelated donor registry in The Netherlands. The analysis characterizes all HLA exons and introns for class I alleles; at least exons 2 to 3 for HLA-DRB1; and exons 2 to 6 for HLA-DQB1. Of the distinct alleles present, there are 229 class I and 71 class II; 36 of these alleles are novel. The majority (approximately 98%) of the cumulative allele frequency at each locus is contributed by alleles that appear three or more times. Alleles encoding protein variation outside of the antigen recognition domains are 0.6% of the class I assignments and 5.3% of the class II assignments. © 2019 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life.
The human gut microbiome matures towards the adult composition during the first years of life and is implicated in early immune development. Here, we investigate the effects of microbial genomic diversity on gut microbiome development using integrated early childhood data sets collected in the DIABIMMUNE study in Finland, Estonia and Russian Karelia. We show that gut microbial diversity is associated with household location and linear growth of children. Single nucleotide polymorphism- and metagenomic assembly-based strain tracking revealed large and highly dynamic microbial pangenomes, especially in the genus Bacteroides, in which we identified evidence of variability deriving from Bacteroides-targeting bacteriophages. Our analyses revealed functional consequences of strain diversity; only 10% of Finnish infants harboured Bifidobacterium longum subsp. infantis, a subspecies specialized in human milk metabolism, whereas Russian infants commonly maintained a probiotic Bifidobacterium bifidum strain in infancy. Groups of bacteria contributing to diverse, characterized metabolic pathways converged to highly subject-specific configurations over the first two years of life. This longitudinal study extends the current view of early gut microbial community assembly based on strain-level genomic variation.
Next-generation HLA typing of 382 International Histocompatibility Working Group reference B-lymphoblastoid cell lines: Report from the 17th International HLA and Immunogenetics Workshop.
Extended molecular characterization of HLA genes in the IHWG reference B-lymphoblastoid cell lines (B-LCLs) was one of the major goals for the 17th International HLA and Immunogenetics Workshop (IHIW). Although reference B-LCLs have been examined extensively in previous workshops complete high-resolution typing was not completed for all the classical class I and class II HLA genes. To address this, we conducted a single-blind study where select panels of B-LCL genomic DNA samples were distributed to multiple laboratories for HLA genotyping by next-generation sequencing methods. Identical cell panels comprised of 24 and 346 samples were distributed and typed by at least four laboratories in order to derive accurate consensus HLA genotypes. Overall concordance rates calculated at both 2- and 4-field allele-level resolutions ranged from 90.4% to 100%. Concordance for the class I genes ranged from 91.7 to 100%, whereas concordance for class II genes was variable; the lowest observed at HLA-DRB3 (84.2%). At the maximum allele-resolution 78 B-LCLs were defined as homozygous for all 11 loci. We identified 11 novel exon polymorphisms in the entire cell panel. A comparison of the B-LCLs NGS HLA genotypes with the HLA genotypes catalogued in the IPD-IMGT/HLA Database Cell Repository, revealed an overall allele match at 68.4%. Typing discrepancies between the two datasets were mostly due to the lower-resolution historical typing methods resulting in incomplete HLA genotypes for some samples listed in the IPD-IMGT/HLA Database Cell Repository. Our approach of multiple-laboratory NGS HLA typing of the B-LCLs has provided accurate genotyping data. The data generated by the tremendous collaborative efforts of the 17th IHIW participants is useful for updating the current cell and sequence databases and will be a valuable resource for future studies.Copyright © 2019. Published by Elsevier Inc.
Our understanding of sequence variation in the HLA-DPB1 gene is largely restricted to the hypervariable antigen recognition domain (ARD) encoded by exon 2. Here, we employed a redundant sequencing strategy combining long-read and short-read data to accurately phase and characterise in full length the majority of common and well-documented (CWD) DPB1 alleles as well as alleles with an observed frequency of at least 0.0006% in our predominantly European sample set. We generated 664 DPB1 sequences, comprising 279 distinct allelic variants. This allows us to present the, to date, most comprehensive analysis of the nature and extent of DPB1 sequence variation. The full-length sequence analysis revealed the existence of two highly diverged allele clades. These clades correlate with the rs9277534 A???G variant, a known expression marker located in the 3′-UTR. The two clades are fully differentiated by 174 fixed polymorphisms throughout a 3.6?kb stretch at the 3′-end of DPB1. The region upstream of this differentiation zone is characterised by increasingly shared variation between the clades. The low-expression A clade comprises 59% of the distinct allelic sequences including the three by far most frequent DPB1 alleles, DPB1*04:01, DPB1*02:01 and DPB1*04:02. Alleles in the A clade show reduced nucleotide diversity with an excess of rare variants when compared to the high-expression G clade. This pattern is consistent with a scenario of recent proliferation of A-clade alleles. The full-length characterisation of all but the most rare DPB1 alleles will benefit the application of NGS for DPB1 genotyping and provides a helpful framework for a deeper understanding of high- and low-expression alleles and their implications in the context of unrelated haematopoietic stem-cell transplantation.Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Recipients receiving better HLA-matched hematopoietic cell transplantation grafts, uncovered by a novel HLA typing method, have superior survival: A retrospective study
HLA matching at an allelic-level resolution for volunteer unrelated donor (VUD) hematopoietic cell transplanta- tion (HCT) results in improved survival and fewer post-transplant complications. Limitations in typing technolo- gies used for the hyperpolymorphic HLA genes have meant that variations outside of the antigen recognition domain (ARD) have not been previously characterized in HCT. Our aim was to explore the extent of diversity out- side of the ARD and determine the impact of this diversity on transplant outcome. Eight hundred ninety-one VUD-HCT donors and their recipients transplanted for a hematologic malignancy in the United Kingdom were ret- rospectively HLA typed at an ultra-high resolution (UHR) for HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 using next- generation sequencing technology. Matching was determined at full gene level for HLA class I and at a coding DNA sequence level for HLA class II genes. The HLA matching status changed in 29.1% of pairs after UHR HLA typ- ing. The 12/12 UHR HLA matched patients had significantly improved 5-year overall survival when compared with those believed to be 12/12 HLA matches based on their original HLA typing but were found to be mismatched after UHR HLA typing (54.8% versus 30.1%, P= .022). Survival was also significantly better in 12/12 UHR HLA- matched patients when compared with those with any degree of mismatch at this level of resolution (55.1% ver- sus 40.1%, P= .005). This study shows that better HLA matching, found when typing is done at UHR that includes exons outside of the ARD, introns, and untranslated regions, can significantly improve outcomes for recipients of a VUD-HCT for a hematologic malignancy and should be prospectively performed at donor selection.
The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5?h using a system with 36?CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.
A systematic review of the Trypanosoma cruzi genetic heterogeneity, host immune response and genetic factors as plausible drivers of chronic chagasic cardiomyopathy.
Chagas disease is a complex tropical pathology caused by the kinetoplastid Trypanosoma cruzi. This parasite displays massive genetic diversity and has been classified by international consensus in at least six Discrete Typing Units (DTUs) that are broadly distributed in the American continent. The main clinical manifestation of the disease is the chronic chagasic cardiomyopathy (CCC) that is lethal in the infected individuals. However, one intriguing feature is that only 30-40% of the infected individuals will develop CCC. Some authors have suggested that the immune response, host genetic factors, virulence factors and even the massive genetic heterogeneity of T. cruzi are responsible of this clinical pattern. To date, no conclusive data support the reason why a few percentages of the infected individuals will develop CCC. Therefore, we decided to conduct a systematic review analysing the host genetic factors, immune response, cytokine production, virulence factors and the plausible association of the parasite DTUs and CCC. The epidemiological and clinical implications are herein discussed.
Quality control project of NGS HLA genotyping for the 17th International HLA and Immunogenetics Workshop.
The 17th International HLA and Immunogenetics Workshop (IHIW) organizers conducted a Pilot Study (PS) in which 13 laboratories (15 groups) participated to assess the performance of the various sequencing library preparation protocols, NGS platforms and software in use prior to the workshop. The organizers sent 50 cell lines to each of the 15 groups, scored the 15 independently generated sets of NGS HLA genotyping data, and generated “consensus” HLA genotypes for each of the 50 cell lines. Proficiency Testing (PT) was subsequently organized using four sets of 24 cell lines, selected from 48 of 50 PS cell lines, to validate the quality of NGS HLA typing data from the 34 participating IHIW laboratories. Completion of the PT program with a minimum score of 95% concordance at the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 loci satisfied the requirements to submit NGS HLA typing data for the 17th IHIW projects. Together, these PS and PT efforts constituted the 17th IHIW Quality Control project. Overall PT concordance rates for HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRB1, HLA-DRB3, HLA-DRB4 and HLA-DRB5 were 98.1%, 97.0% and 98.1%, 99.0%, 98.6%, 98.8%, 97.6%, 96.0%, 99.1%, 90.0% and 91.7%, respectively. Across all loci, the majority of the discordance was due to allele dropout. The high cost of NGS HLA genotyping per experiment likely prevented the retyping of initially failed HLA loci. Despite the high HLA genotype concordance rates of the software, there remains room for improvement in the assembly of more accurate consensus DNA sequences by NGS HLA genotyping software. Copyright © 2019 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Despite recent breakthroughs in treatment of hepatitis C virus (HCV) infection, we have limited understanding of how virus diversity generated within individuals impacts the evolution and spread of HCV variants at the population scale. Addressing this gap is important for identifying the main sources of disease transmission and evaluating the risk of drug-resistance mutations emerging and disseminating in a population.We have undertaken a high-resolution analysis of HCV within-host evolution from 4 individuals coinfected with human immunodeficiency virus 1 (HIV-1). We used long-read, deep-sequenced data of full-length HCV envelope glycoprotein, longitudinally sampled from acute to chronic HCV infection to investigate the underlying viral population and evolutionary dynamics.We found statistical support for population structure maintaining the within-host HCV genetic diversity in 3 out of 4 individuals. We also report the first population genetic estimate of the within-host recombination rate for HCV (0.28 × 10-7 recombination/site/year), which is considerably lower than that estimated for HIV-1 and the overall nucleotide substitution rate estimated during HCV infection.Our findings indicate that population structure and strong genetic linkage shapes within-host HCV evolutionary dynamics. These results will guide the future investigation of potential HCV drug resistance adaptation during infection, and at the population scale. © The Author(s) 2019. Published by Oxford University Press for the Infectious Diseases Society of America.
Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing.
Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.