Big things are happening at PacBio! The latest launches of the Vega system and SPRQ chemistry are creating quite a buzz and we are eager to continue exploring the power of these innovations in upcoming posts. To help frame the impact of these new technologies, this week’s blog takes a step back to examine the fundamentals of targeted sequencing, showcasing this critical application for these new tools.
Targeted sequencing is a powerful approach for focusing on the specific genomic regions you care about while reducing costs and labor time. Read ahead to learn about the different techniques of targeted sequencing, and how these methods are used to access regions of the genome that are difficult to profile with traditional sequencing technology.
What is targeted sequencing?
Targeted sequencing is an alternative to whole genome sequencing (WGS) that allows you to home in on your regions of interest from the start of an experiment. It’s a general term for library preparation approaches that focus specifically on a region or gene of interest. While this approach doesn’t provide as complete a view of the entire genome as with WGS, it provides many advantages like enhanced scale, efficiency, and reduced costs.
In targeting known regions of interest, including genes that may be drivers of disease, targeted sequencing is critically important in human health research. This approach is especially valuable in fields such as genetic disease screening, cancer, infectious disease, immunology, and pharmacogenomics, where focused insights can reveal genetic contributors to complex conditions. By narrowing in on specific genetic regions, researchers gain a deeper understanding of how variants affect disease risk or mechanism, immune responses, and drug interactions, paving the way for personalized medicine and more effective treatments.
Why use targeted sequencing?
Focusing on a subset of genomic targets allows researchers to utilize end-to-end workflows that can efficiently characterize regions of interest in large numbers of samples. For example, library prep workflows can be optimized to efficiently target a panel of genes, allowing the cost of sequencing to be shared across many samples per sequencing run and bioinformatic analysis to extract specific insights out of just the genes of interest with only modest compute resources.
Aside from the workflow benefits, targeted sequencing with uniform coverage can be the pathway towards understanding the dark regions of our genome. These areas, such as repeat expansions, promoters or introns, are often repetitive, GC rich, or otherwise difficult to sequence, but still harbor important variants for human health. By specifically focusing on these critical areas of the genome with targeted sequencing, researchers can profile regions of interest at high depth or on a massive scale.
Though the regions of interest are by definition smaller than entire genomes, targeted sequencing benefits from highly accurate long-read sequencing for a number of reasons. Many important targets like immune genes (e.g., HLA) and pharmacogenes (e.g., CYP2D6) belong to complex gene families, contain repeat expansions (e.g., C9orf72), or have paralogous (SMN1/2) or pseudogenes (CYP21A2/1P). These regions require unambiguous, phase-resolved imputation-free typing that can span the entire regions and resolve complex variation.
Types of targeted sequencing
There are several approaches to targeted sequencing that can help researchers answer these pressing questions. Selecting the best approach will depend on the targets of interest, the complexity of the regions, scale, and required sequencing depth.
Amplicon sequencing
Amplicon sequencing uses polymerase chain reaction (PCR) to selectively amplify genetic regions of interest. Primers are designed to bookend the region of interest for amplification so that these regions, or amplicons, can be sequenced specifically. Though optimization may take more time and is best for small panels, amplicon sequencing is a good choice for many labs due to its simplicity, low cost, and high sample throughput.
Amplicons up to many kilobases long benefit from highly accurate long-read sequencing like HiFi technology because the read lengths can span entire amplicons in a single read. This technology grants unambiguous haplotype resolution through direct phasing, allowing for comprehensive, ancestry-agnostic variant detection. Where long-read amplicon sequencing is limited by PCR, a specialized long-range PCR can be used to amplify significantly longer sequences that present challenges under typical PCR conditions.
Amplicon sequencing can be used to profile clinically relevant genes for diseases, as with recent research investigating genetic contributors to Parkinson’s Disease (PD). In this paper, the authors used HiFi targeted sequencing to examine a complex, CT-rich region of the gene SNCA, which may influence PD risk. HiFi sequencing is critical for resolving large, complex structural variants in this region that are virtually undetectable with some legacy short-read sequencing methods. This research demonstrates the importance of high-quality targeted sequencing methods to avoid delays in important research progress in neurological diseases.
Hybrid capture sequencing
Hybrid capture sequencing is a probe-based method that also relies on PCR amplification and can be useful for capturing big regions. This target enrichment approach drills down on specific areas of interest using DNA-DNA hybridization between the targeted sample and a DNA probe. For hybrid capture HiFi sequencing, the bound target DNA is separated with magnetic beads before PCR, and the remaining DNA is washed away. One of the advantages of hybrid capture over amplicon sequencing is the ability to tile fragments across large regions. Unlike amplicon sequencing, which is limited in length to 10–20 kb, hybrid capture probes can tile across 10s or 100s of kilobases.
Amplicon-Free Sequencing
Amplicon-free targeted sequencing methods use CRISPR-Cas9 to isolate targeted regions without using PCR amplification. This approach generates targeted native DNA libraries without any potential replication errors or other artifacts and retains epigenetic signals like CpG methylation. With HiFi sequencing, the free ends of all DNA in the sample are first blocked via dephosphorylation. Next, guide RNAs bind to the targeted region of interest, and the Cas9 enzyme cuts around that region, generating unblocked DNA ends. The unblocked ends are ligated to sequencing adapters and then off-target molecules are digested away, leaving enriched libraries that can be deeply sequenced.
This approach is powerful for profiling genetic variation in repetitive and GC-rich regions that cannot be amplified by PCR. A recently published protocol demonstrates amplicon-free sequencing of these complex regions in the C9orf72 gene, the most frequent genetic cause of amyotrophic lateral sclerosis (ALS) and frontal temporal dementia (FTD). Despite this association, C9orf72 has been dramatically understudied due to the difficulties with PCR. The availability of an amplification-free targeted sequencing method that can cover large, complex regions of the genome lays the foundation for a new wave of previously impossible neurology research.
Targeted sequencing with PacBio
PacBio offers sequencing solutions for each of these targeted sequencing methods across its platforms, providing unambiguous haplotype resolution through direct phasing and the comprehensive, ancestry-agnostic variant detection that is critical for human health research.
The release of the PureTarget repeat expansion panel offers an amplification-free method for sequencing repeat expansions, repetitive DNA sequences linked to over 60 monogenic disorders and cancers. Once notoriously difficult to characterize, these regions can now be comprehensively genotyped at scale with the PureTarget gene panel for 20 of the most important repeat expansions for human health.
The Twist Alliance Dark Genes and Long Read PGx panels combine Twist Bioscience target enrichment with long and accurate HiFi reads to efficiently sequence priority genomic regions at scale. With these panels, enriched regions are sequenced with a protocol optimized for HiFi reads to get comprehensive detection of single nucleotide variants, structural variants, and indels with haplotype resolution. HiFi target enrichment delivers accurate alleles for complex gene families like HLA and CYP2D6 pharmacogenes. Target enrichment is also available for short-read SBB sequencing on the Onso system with Twist Exome 2.0 and the Agilent hybrid capture panel.
For amplicon sequencing, existing PCR-based workflows can be easily adapted to HiFi sequencing with the HiFi plex prep kit 96. This kit comes with barcoded adapters and low-cost library prep reagents that enable high samples multiplexing on any HiFi sequencer.
Want to learn more about the impact of highly accurate targeted sequencing?
Explore the PacBio targeted sequencing page for a full list of resources, and visit the targeted sequencing datasets page for a closer look at the targeted sequencing methods available with PacBio technology.