PacBio 2013 User Group Meeting Presentation Slides: Lisbeth Guethlein from Stanford University School of Medicine looked at highly repetitive and variable immune regions of the orangutan genome. Guethlein reported that “PacBio managed to accomplish in a week what I have been working on for a couple years” (with Sanger sequencing), and the results were concordant. “Long story short, I was a happy customer.”
Unique haplotype structure determination in human genome using Single Molecule, Real-Time (SMRT) Sequencing of targeted full-length fosmids.
Determination of unique individual haplotypes is an essential first step toward understanding how identical genotypes having different phases lead to different biological interpretations of function, phenotype, and disease. Genome-wide methods for identifying individual genetic variation have been limited in their ability to acquire phased, extended, and complete genomic sequences that are long enough to assemble haplotypes with high confidence. We explore a recombineering approach for isolation and sequencing of a tiling of targeted fosmids to capture interesting regions from human genome. Each individual fosmid contains large genomic fragments (~35?kb) that are sequenced with long-read SMRT technology to generate contiguous long reads. These long reads can be easily de novo assembled for targeted haplotype resolution within an individual’s genomes. The P5-C3 chemistry for SMRT Sequencing generated contiguous, full-length fosmid sequences of 30 to 40 kb in a single read, allowing assembly of resolved haplotypes with minimal data processing. The phase preserved in fosmid clones spanned at least two heterozygous variant loci, providing the essential detail of precise haplotype structures. We show complete assembly of haplotypes for various targeted loci, including the complex haplotypes of the KIR locus (~150 to 200 kb) and conserved extended haplotypes (CEHs) of the MHC region. This method is easily applicable to other regions of the human genome, as well as other genomes.
Complete resequencing of extended genomic regions using fosmid target capture and single molecule real-time (SMRT) long read sequencing technology.
A longstanding goal of genomic analysis is the identification of causal genetic factors contributing to disease. While the common disease/common variant hypothesis has been tested in many genome-wide association studies, few advancements in identifying causal variation have been realized, and instead recent findings point away from common variants towards aggregate rare variants as causal. A challenge is obtaining complete phased genomic sequences over extended genomic regions from sufficient numbers of cases and controls to identify all potential variation causal of a disease. To address this, we modified methods for targeted DNA isolation using fosmid technology and single-molecule, long-sequence-read generaton that combine for complete, haplotype-resolved resequencing across extended genomic subregions. As proof of principal, we validated the approach by resequencing four 800 kbp segments that span a major histocompatibility complex (MHC) common extended haplotype (CEH) associated with disease. The data revealed the extent of conservation exposing a near identity among four DR4 CEHs over conserved regions, detailing rare variation and measuring sequence accuracy. In a second test, we sequenced the complete KIR haplotypes from 8 individuals within a specific timeframe and cost. Single molecule long-read sequencing technology generated contiguous full-length fosmid sequences of 30 to 40 kb in a single read, allowing assembly of resolved haplotypes with very little data processing. All of the sequences produced from these projects were contiguous, phased, with accuracy above 99.99%. The results demonstrated that cost-effective scale-up is possible to generate scores to hundreds of phased chromosomal sequences of extended lengths that can encompass genomic regions associated with disease.
Assembly of complete KIR haplotypes from a diploid individual by the direct sequencing of full-length fosmids.
We show that linearizing and directly sequencing full-length fosmids simplifies the assembly problem such that it is possible to unambiguously assemble individual haplotypes for the highly repetitive 100-200 kb killer Ig-like receptor (KIR) gene loci of chromosome 19. A tiling of targeted fosmids can be used to clone extended lengths of genomic DNA, 100s of kb in length, but repeat complexity in regions of particular interest, such as the KIR locus, means that sequence assembly of pooled samples into complete haplotypes is difficult and in many cases impossible. The current maximum read length generated by SMRT Sequencing exceeds the length of a 40 kb fosmid; it is therefore possible to span an entire fosmid in one sequencing read. Shearing, sequencing and assembling fosmids in a shotgun approach is prone to errors when the underlying sequence is highly repetitive. We show that it is possible to directly sequence linearized fosmids and generate a high-quality consensus by simple alignment, removing the need for an error-prone assembly step. The high-quality sequence of complete fosmids can then be tiled into full haplotypes. We demonstrate the method on DNA samples from a number of individuals and fully recover the sequence of both haplotypes from a pool of KIR fosmids. The ability to haplotype and sequence complex immunogenetic regions will bring exciting opportunities to explore the evolution of disease associations of the immune sub-genome. This simple and robust approach can be scaled-up allowing a complex genomic region to be sequenced at a population level. We expect such sequencing to be valuable in disease association research.
The complex immune regions of the genome, including MHC and KIR, contain large copy number variants (CNVs), a high density of genes, hyper-polymorphic gene alleles, and conserved extended haplotypes (CEH) with enormous linkage disequilibrium (LDs). This level of complexity and inherent biases of short-read sequencing make it challenging for extracting immune region haplotype information from reference-reliant, shotgun sequencing and GWAS methods. As NGS based genome and exome sequencing and SNP arrays have become a routine for population studies, numerous efforts are being made for developing software to extract and or impute the immune gene information from these datasets. Despite these efforts, the fine mapping of causal variants of immune genes for their well-documented association with cancer, drug-induced hypersensitivity and immune-related diseases, has been slower than expected. This has in many ways limited our understanding of the mechanisms leading to immune disease. In the present work, we demonstrate the advantages of long reads delivered by SMRT Sequencing for assembling complete haplotypes of MHC and KIR gene clusters, as well as calling correct genotypes of genes comprised within them. All the genotype information is detected at allele- level with full phasing information across SNP-poor regions. Genotypes were called correctly from targeted gene amplicons, haplotypes, as well as from a completely assembled 5 Mb contig of the MHC region from a de novo assembly of whole genome shotgun data. De novo analysis pipeline used in all these approaches allowed for reference-free analysis without imputation, a key for interrogation without prior knowledge about ethnic backgrounds. These methods are thus easily adoptable for previously uncharacterized human or non-human species.
The killer immunoglobulin-like receptors (KIR) genes belong to the immunoglobulin superfamily and are widely studied due to the critical role they play in coordinating the innate immune response to infection and disease. Highly accurate, contiguous, long reads, like those generated by SMRT Sequencing, when combined with target-enrichment protocols, provide a straightforward strategy for generating complete de novo assembled KIR haplotypes. We have explored two different methods to capture the KIR region; one applying the use of fosmid clones and one using Nimblegen capture.
Whole gene sequencing of KIR-3DL1 with SMRT Sequencing and the distribution of allelic variants in different ethnic groups
The killer-cell immunoglobulin-like receptor (KIR) gene family are involved in immune modulation during viral infection, autoimmune disease and in allogeneic stem cell transplantation. Most KIR gene diversity studies and their impact on the transplant outcome is performed by gene absence/presence assays. However, it is well known that KIR gene allelic variations have biological significance. Allele level typing of KIR genes has been very challenging until recently due to the homologous nature of those genes and very long intronic sequences. SMRT (Single Molecule Real-Time) Sequencing generates average long reads of 10 to 15 kb and allows us to obtain in-phase long sequence reads. We have developed a PCR assay for SMRT Sequencing on the PacBio RS II platform in our lab for 3DL1 whole gene sequencing. This approach allows us to obtain allele level typing for 3DL1 genes and could serve as a model to type other KIR genes at allelic level.
Dan Geraghty explains that while there have been decades’ worth of studies associating the genetics of the major histocompatibility complex (MHC), and the highly polymorphic HLA class 1 and 2…
AGBT PacBio Workshop: High-throughput HLA class I whole gene and HLA class II long range typing on PacBio RSII and Sequel Platforms
In a talk at AGBT 2017, Histogenetics CEO Nezih Cereb reported on how SMRT Sequencing is allowing his team to produce full-length, phased sequences for HLA alleles, which are important…
Report from the Eleventh Killer Immunoglobulin-like Receptor (KIR) Workshop: Novel insights on KIR polymorphism, ligand recognition, expression and function.
The Eleventh Killer Immunoglobulin-like Receptor (KIR) Workshop was held in Camogli (Genoa, Italy) in October 2018. This congress brought together 113 participants working on KIR field. Fifty-eight studies have been presented, the majority of which included unpublished data. Thus, KIR workshop, allowing the meeting of people sharing their knowledge and experience in a friendly atmosphere, still represents a special event of fruitful discussion and exchange of novel breakthrough, results, and ideas. In this report, we summarize all the scientific contributions highlighting the most recent advances in KIR field. Forty abstracts presented at the KIR Workshop are published in this issue. © 2019 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The killer-cell Ig-like receptors (KIR) form a multigene entity involved in modulating immune responses through interactions with MHC class I molecules. The complexity of the KIR cluster is reflected by, for instance, abundant levels of allelic polymorphism, gene copy number variation, and stochastic expression profiles. The current transcriptome study involving human and macaque families demonstrates that KIR family members are also subjected to differential levels of alternative splicing, and this seems to be gene dependent. Alternative splicing may result in the partial or complete skipping of exons, or the partial inclusion of introns, as documented at the transcription level. This post-transcriptional process can generate multiple isoforms from a single KIR gene, which diversifies the characteristics of the encoded proteins. For example, alternative splicing could modify ligand interactions, cellular localization, signaling properties, and the number of extracellular domains of the receptor. In humans, we observed abundant splicing for KIR2DL4, and to a lesser extent in the lineage III KIR genes. All experimentally documented splice events are substantiated by in silico splicing strength predictions. To a similar extent, alternative splicing is observed in rhesus macaques, a species that shares a close evolutionary relationship with humans. Splicing profiles of Mamu-KIR1D and Mamu-KIR2DL04 displayed a great diversity, whereas Mamu-KIR3DL20 (lineage V) is consistently spliced to generate a homolog of human KIR2DL5 (lineage I). The latter case represents an example of convergent evolution. Although just a single KIR splice event is shared between humans and macaques, the splicing mechanisms are similar, and the predicted consequences are comparable. In conclusion, alternative splicing adds an additional layer of complexity to the KIR gene system in primates, and results in a wide structural and functional variety of KIR receptors and its isoforms, which may play a role in health and disease.
Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics.
Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio’s single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing.© The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
The killer-cell Ig-like receptors (KIRs) play a central role in the immune recognition in infection, pregnancy, and transplantation through their interactions with MHC class I molecules. KIR genes display abundant copy number variation as well as high levels of polymorphism. As a result, it is challenging to characterize this structurally dynamic region. KIR haplotypes have been analyzed in different species using conventional characterization methods, such as Sanger sequencing and Roche/454 pyrosequencing. However, these methods are time-consuming and often failed to define complete haplotypes, or do not reach allele-level resolution. In addition, most analyses were performed on genomic DNA, and thus were lacking substantial information about transcription and its corresponding modifications. In this paper, we present a single-molecule real-time sequencing approach, using Pacific Biosciences Sequel platform to characterize the KIR transcriptomes in human and rhesus macaque (Macaca mulatta) families. This high-resolution approach allowed the identification of novel Mamu-KIR alleles, the extension of reported allele sequences, and the determination of human and macaque KIR haplotypes. In addition, multiple recombinant KIR genes were discovered, all located on contracted haplotypes, which were likely the result of chromosomal rearrangements. The relatively high number of contracted haplotypes discovered might be indicative of selection on small KIR repertoires and/or novel fusion gene products. This next-generation method provides an improved high-resolution characterization of the KIR cluster in humans and macaques, which eventually may aid in a better understanding and interpretation of KIR allele-associated diseases, as well as the immune response in transplantation and reproduction. Copyright © 2018 by The American Association of Immunologists, Inc.
Improved full-length killer cell immunoglobulin-like receptor transcript discovery in Mauritian cynomolgus macaques.
Killer cell immunoglobulin-like receptors (KIRs) modulate disease progression of pathogens including HIV, malaria, and hepatitis C. Cynomolgus and rhesus macaques are widely used as nonhuman primate models to study human pathogens, and so, considerable effort has been put into characterizing their KIR genetics. However, previous studies have relied on cDNA cloning and Sanger sequencing that lack the throughput of current sequencing platforms. In this study, we present a high throughput, full-length allele discovery method utilizing Pacific Biosciences circular consensus sequencing (CCS). We also describe a new approach to Macaque Exome Sequencing (MES) and the development of the Rhexome1.0, an adapted target capture reagent that includes macaque-specific capture probe sets. By using sequence reads generated by whole genome sequencing (WGS) and MES to inform primer design, we were able to increase the sensitivity of KIR allele discovery. We demonstrate this increased sensitivity by defining nine novel alleles within a cohort of Mauritian cynomolgus macaques (MCM), a geographically isolated population with restricted KIR genetics that was thought to be completely characterized. Finally, we describe an approach to genotyping KIRs directly from sequence reads generated using WGS/MES reads. The findings presented here expand our understanding of KIR genetics in MCM by associating new genes with all eight KIR haplotypes and demonstrating the existence of at least one KIR3DS gene associated with every haplotype.
KIR3DL01 upregulation on gut natural killer cells in response to SIV infection of KIR- and MHC class I-defined rhesus macaques.
Natural killer cells provide an important early defense against viral pathogens and are regulated in part by interactions between highly polymorphic killer-cell immunoglobulin-like receptors (KIRs) on NK cells and their MHC class I ligands on target cells. We previously identified MHC class I ligands for two rhesus macaque KIRs: KIR3DL01 recognizes Mamu-Bw4 molecules and KIR3DL05 recognizes Mamu-A1*002. To determine how these interactions influence NK cell responses, we infected KIR3DL01+ and KIR3DL05+ macaques with and without defined ligands for these receptors with SIVmac239, and monitored NK cell responses in peripheral blood and lymphoid tissues. NK cell responses in blood were broadly stimulated, as indicated by rapid increases in the CD16+ population during acute infection and sustained increases in the CD16+ and CD16-CD56- populations during chronic infection. Markers of proliferation (Ki-67), activation (CD69 & HLA-DR) and antiviral activity (CD107a & TNFa) were also widely expressed, but began to diverge during chronic infection, as reflected by sustained CD107a and TNFa upregulation by KIR3DL01+, but not by KIR3DL05+ NK cells. Significant increases in the frequency of KIR3DL01+ (but not KIR3DL05+) NK cells were also observed in tissues, particularly in the gut-associated lymphoid tissues, where this receptor was preferentially upregulated on CD56+ and CD16-CD56- subsets. These results reveal broad NK cell activation and dynamic changes in the phenotypic properties of NK cells in response to SIV infection, including the enrichment of KIR3DL01+ NK cells in tissues that support high levels of virus replication.