The long read lengths of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases of sequence. This feature is particularly useful in the context of protein engineering, where large numbers of similar constructs are generated routinely to explore the effects of mutations on function and stability. We have developed a PCR-based barcoded sequencing method to generate high quality, full-length sequence data for batches of constructs generated in a common backbone. Individual barcodes are coupled to primers targeting a common region of the vector of interest. The amplified products are pooled into a single DNA library, and sequencing data are clustered by barcode to generate multi-molecule consensus sequences for each construct present in the pool. As a proof-of-concept dataset, we have generated a library of 384 randomly mutated variants of the Phi29 DNA polymerase, a 575 amino acid protein encoded by a 1.7 kb gene. These variants were amplified with a set of barcoded primers, and the resulting library was sequenced on a single SMRT Cell. The data produced sequences that were completely concordant with independent Sanger sequencing, for a 100% accurate reconstruction of the set of clones.
HLA sequencing using SMRT Technology – High resolution and high throughput HLA genotyping in a clinical setting
Sequence based typing (SBT) is considered the gold standard method for HLA typing. Current SBT methods are rather laborious and are prone to phase ambiguity problems and genotyping uncertainties. As a result, the NGS community is rapidly seeking to remedy these challenges, to produce high resolution and high throughput HLA sequencing conducive to a clinical setting. Today, second generation NGS technologies are limited in their ability to yield full length HLA sequences required for adequate phasing and identification of novel alleles. Here we present the use of single molecule real time (SMRT) sequencing as a means of determining full length/long HLA sequences. Moreover we reveal the scalability of this method through multiplexing approches and determine HLA genotyping calls through the use of third party Gendx NGSengine® software.
Long Amplicon Analysis: Highly accurate, full-length, phased, allele-resolved gene sequences from multiplexed SMRT Sequencing data.
The correct phasing of genetic variations is a key challenge for many applications of DNA sequencing. Allele-level resolution is strongly preferred for histocompatibility sequencing where recombined genes can exhibit different compatibilities than their parents. In other contexts, gene complementation can provide protection if deleterious mutations are found on only one allele of a gene. These problems are especially pronounced in immunological domains given the high levels of genetic diversity and recombination seen in regions like the Major Histocompatibility Complex. A new tool for analyzing Single Molecule, Real-Time (SMRT) Sequencing data – Long Amplicon Analysis (LAA) – can generate highly accurate, phased and full-length consensus sequences for multiple genes in a single sequencing run.
Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facilitated by SMRT Sequencing technology. In the present work, we have evaluated multiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell adapters. Eight different 16-bp barcode sequences were used in symmetric & asymmetric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRT Cell. Amplicons generated from barcoded primers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 samples and also allowed generation of unique asymmetric pairings for simultaneous amplification from 28 reference genomic DNA samples. The data generated from all 3 methods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software.
Multiplexing human HLA class I & II genotyping with DNA barcode adapters for high throughput research.
Human MHC class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DP and -DQ, play a critical role in the immune system as major factors responsible for organ transplant rejection. The have a direct or linkage-based association with several diseases, including cancer and autoimmune diseases, and are important targets for clinical and drug sensitivity research. HLA genes are also highly polymorphic and their diversity originates from exonic combinations as well as recombination events. A large number of new alleles are expected to be encountered if these genes are sequenced through the UTRs. Thus allele-level resolution is strongly preferred when sequencing HLA genes. Pacific Biosciences has developed a method to sequence the HLA genes in their entirety within the span of a single read taking advantage of long read lengths (average >10 kb) facilitated by SMRT technology. A highly accurate consensus sequence (=99.999 or QV50 demonstrated) is generated for each allele in a de novo fashion by our SMRT Analysis software. In the present work, we have combined this imputation-free, fully phased, allele-specific consensus sequence generation workflow and a newly developed DNA-barcode-tagged SMRTbell sample preparation approach to multiplex 96 individual samples for sequencing all of the HLA class I and II genes. Commercially available NGS-go reagents for full-length HLA class I and relevant exons of class II genes were amplified for hi-resolution HLA sequencing. The 96 samples included 72 that are part of UCLA reference panel and had pre-typing information available for 2 fields, based on gold standard SBT methods. SMRTbell adapters with 16 bp barcode tags were ligated to long amplicons in symmetric pairing. PacBio sequencing was highly effective in generating accurate, phased sequences of full-length alleles of HLA genes. In this work we demonstrate scalability of HLA sequencing using off the shelf assays for research applications to find biological significance in full-length sequencing.
We have developed barcoding reagents and workflows for multiplexing amplicons or fragmented native genomic (DNA) prior to Single Molecule, Real-Time (SMRT) Sequencing. The long reads of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases (kb) of sequence. This feature is particularly useful in the context of mutational analysis or SNP confirmation, where a large number of samples are generated routinely. To validate this workflow, a set of 384 1.7-kb amplicons, each derived from variants of the Phi29 DNA polymerase gene, were barcoded during amplification, pooled, and sequenced on a single SMRT Cell. To demonstrate the applicability of the method to longer inserts, a library of 96 5-kb clones derived from the E. coli genome was sequenced.
Access full spectrum of polymorphisms in HLA class I & II genes, without imputation for disease association and evolutionary research.
MHC class I and II genes are critically monitored by high-resolution sequencing for organ transplant decisions due to their role in GVHD. Their direct or linkage-based causal association, have increased their prominence as targets for drug sensitivity, autoimmune, cancer and infectious disease research. Monitoring HLA genes can however be tricky due to their highly polymorphic nature. Allele-level resolution is thus strongly preferred. However, most studies were historically focused on peptide binding domains of the HLA genes, due to technological challenges. As a result knowledge about the functional role of polymorphisms outside of exons 2 and 3 of HLA genes was rather limited. There are also relatively few full-length gene references currently available in the IMGT HLA database. This made it difficult to quickly adopt high-throughput reference-reliant methods for allele-level HLA sequencing. Increasing awareness regarding role of regulatory region polymorphisms of HLA genes in disease association1, nonetheless have brought about a revolution in full-length HLA gene sequencing. Researchers are now exploring ways to obtain complete information for HLA genes and integrate it with the current HLA database so it can be interpreted used by clinical researchers. We have explored advantages of SMRT Sequencing to obtain fully phased, allele-specific sequences of HLA class I and II genes for 96 samples using completely De novo consensus generation approach for imputation-free 4-field typing. With long read lengths (average >10 kb) and consensus accuracy exceeding 99.999% (Q50), a comprehensive snapshot of variants in exons, introns and UTRs could be obtained for spectrum of polymorphisms in phase across SNP-poor regions. Such information can provide invaluable insights in future causality association and population diversity research.
Targeted sequencing employing PCR amplification is a fundamental approach to studying human genetic disease. PacBio’s Sequel System and supporting products provide an end-to-end solution for amplicon sequencing, offering better performance to Sanger technology in accuracy, read length, throughput, and breadth of informative data. Sample multiplexing is supported with three barcoding options providing the flexibility to incorporate unique sample identifiers during target amplification or library preparation. Multiplexing is key to realizing the full capacity of the 1 million individual reactions per Sequel SMRT Cell. Two analysis workflows that can generate high-accuracy results support a wide range of amplicon sizes in two ranges from 250 bp to 3 kb and from 3 kb to >10 kb. The Circular Consensus Sequencing workflow results in high accuracy through intra-molecular consensus generation, while high accuracy for the Long Amplicon Analysis workflow is achieved by clustering of individual long reads from multiple reactions. Here we present workflows and results for single- molecule sequencing of amplicons for human genetic analysis.
Mitochondrial DNA (mtDNA) is a compact, double-stranded circular genome of 16,569 bp with a cytosine-rich light (L) chain and a guanine-rich heavy (H) chain. mtDNA mutations have been increasingly recognized as important contributors to an array of human diseases such as Parkinson’s disease, Alzheimer’s disease, colorectal cancer and Kearns–Sayre syndrome. mtDNA mutations can affect all of the 1000-10,000 copies of the mitochondrial genome present in a cell (homoplasmic mutation) or only a subset of copies (heteroplasmic mutation). The ratio of normal to mutant mtDNAs within cells is a significant factor in whether mutations will result in disease, as well as the clinical presentation, penetrance, and severity of the phenotype. Over time, heteroplasmic mutations can become homoplastic due to differential replication and random assortment. Full characterization of the mitochondrial genome would involve detection of not only homoplastic but heteroplasmic mutations, as well as complete phasing. Previously, we sequenced human mtDNA on the PacBio RS II System with two partially overlapping amplicons. Here, we present amplification-free, full-length sequencing of linearized mtDNA using the Sequel System. Full-length sequencing allows variant phasing along the entire mitochondrial genome, identification of heteroplasmic variants, and detection of epigenetic modifications that are lost in amplicon-based methods.
Targeted sequencing with Sanger as well as short read based high throughput sequencing methods is standard practice in clinical genetic testing. However, many applications beyond SNP detection have remained somewhat obstructed due to technological challenges. With the advent of long reads and high consensus accuracy, SMRT Sequencing overcomes many of the technical hurdles faced by Sanger and NGS approaches, opening a broad range of untapped clinical sequencing opportunities. Flexible multiplexing options, highly adaptable sample preparation method and newly improved two well-developed analysis methods that generate highly-accurate sequencing results, make SMRT Sequencing an adept method for clinical grade targeted sequencing. The Circular Consensus Sequencing (CCS) analysis pipeline produces QV 30 data from each single intra-molecular multi-pass polymerase read, making it a reliable solution for detecting minor variant alleles with frequencies as low as 1 %. Long Amplicon Analysis (LAA) makes use of insert spanning full-length subreads originating from multiple individual copies of the target to generate highly accurate and phased consensus sequences (>QV50), offering a unique advantage for imputation free allele segregation and haplotype phasing. Here we present workflows and results for a range of SMRT Sequencing clinical applications. Specifically, we illustrate how the flexible multiplexing options, simple sample preparation methods and new developments in data analysis tools offered by PacBio in support of Sequel System 5.1 can come together in a variety of experimental designs to enable applications as diverse as high throughput HLA typing, mitochondrial DNA sequencing and viral vector integrity profiling of recombinant adeno-associated viral genomes (rAAV).
AGBT Virtual Poster: Single-molecule sequencing reveals the presence of distinct JC polyomavirus populations in patients with progressive multifocal leukoencephalopathy
At AGBT 2017, Lars Paulin from the University of Helsinki presented this poster on whole genome sequencing of the virus responsible for progressive multifocal leukoencephalopathy, a rare and dangerous brain…
This tutorial provides an overview of the Long Amplicon Analysis (LAA) application. The LAA algorithm generates highly accurate, phased and full-length consensus sequences from long amplicons. Applications of LAA include…
Human MHC class I genes HLA-A, -B, -C, and class II genes HLA -DR, -DQ, and -DP play a critical role in the immune system as primary factors responsible for…
PacBio SMRT Sequencing is fast changing the genomics space with its long reads and high consensus sequence accuracy, providing the most comprehensive view of the genome and transcriptome. In this…
Explore human genetic variation and learn how SMRT Sequencing uncovers the full spectrum of structural variation to advance understanding of genetic disease and broaden our knowledge of human diversity.