The long read lengths of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases of sequence. This feature is particularly useful in the context of protein engineering, where large numbers of similar constructs are generated routinely to explore the effects of mutations on function and stability. We have developed a PCR-based barcoded sequencing method to generate high quality, full-length sequence data for batches of constructs generated in a common backbone. Individual barcodes are coupled to primers targeting a common region of the vector of interest. The amplified products are pooled into a single DNA library, and sequencing data…
Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers to understand molecular mechanisms in evolution and gain insight into adaptive strategies. With read lengths exceeding 10 kb, we are able to sequence high-quality, closed microbial genomes with associated plasmids, and investigate large genome complexities, such as long, highly repetitive, low-complexity regions and multiple tandem-duplication events. Improved genome quality, observed at 99.9999% (QV60) consensus accuracy, and significant reduction of gap regions in reference genomes (up to and beyond 50%) allow researchers to better understand coding sequences with high confidence, investigate potential regulatory mechanisms in noncoding regions, and make inferences…
There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments generally use short-read, second-generation sequencing, which results in data processing difficulties. For example, reads less than 1 kb in length will likely not cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members…
We have developed barcoding reagents and workflows for multiplexing amplicons or fragmented native genomic (DNA) prior to Single Molecule, Real-Time (SMRT) Sequencing. The long reads of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases (kb) of sequence. This feature is particularly useful in the context of mutational analysis or SNP confirmation, where a large number of samples are generated routinely. To validate this workflow, a set of 384 1.7-kb amplicons, each derived from variants of the Phi29 DNA polymerase gene, were barcoded during amplification, pooled, and sequenced on a single SMRT Cell. To demonstrate the applicability of…
Aim: In contrast to exon-based HLA-typing approaches, whole gene genotyping crucially depends on full-length sequences submitted to the IMGT/HLA Database. Currently, full-length sequences are provided for only 7 out of 520 HLA-DPB1 alleles. Therefore, we developed a fully phased whole-gene sequencing approach for DPB1, to facilitate further exploration of the allelic structure at this locus. Methods: Primers were developed flanking the UTR-regions of DPB1 resulting in a 12 kb amplicon. Using a 4-primer approach, secondary primers containing barcodes were combined with the gene-specific primers to obtain barcoded full-gene amplicons in a single amplification step. Amplicons were pooled, purified, and ligated…
Aim: The vast majority of donor typing relies on sequencing exons 2 and 3 of HLA class I genes (HLA-A, -B, -C). With such an approach certain allele combinations do not result in the anticipated “high resolution” (G-code) typing, due to the lack of exon-phasing information. To resolve ambiguous typing results for a haplotype frequency project, we established a whole gene sequencing approach for HLA class I, facilitating also an estimation of the degree of sequence variability outside the commonly sequenced exons. Methods: Primers were developed flanking the UTR regions resulting in similar amplicon lengths of 4.2-4.4 kb. Using a…
There are many sequencing-based approaches to understanding complex metagenomic communities, spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments require a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-2 kb range, with >99% consensus accuracy, can be efficiently generated for low amounts of input DNA, e.g. as little as 10 ng of input DNA sequenced in 4 SMRT Cells can generate…
There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented…
There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500 bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be…
Next-generation sequencing (NGS) has significantly improved the cost and turnaround time for diagnostic genetic tests. ACMG recommends variant confirmation by an orthogonal method, unless sufficiently high sensitivity and specificity can be demonstrated using NGS alone. Most NGS laboratories make extensive use of Sanger sequencing for secondary confirmation of single nucleotide variants (SNVs) and indels, representing a large fraction of the cost and time required to deliver high quality genetic testing data to clinicians and patients. Despite its established data quality, Sanger is not a high-throughput method by today’s standards from either an assay or analysis standpoint as it can involve…
Strain level microbiome profiling is needed for a full understanding of how microbial communities influence human health. Microbiome profiling of rRNA gene amplicons is a well-understood method that is rapid and inexpensive, but standard 16S rRNA gene methods generally cannot differentiate closely related strains. Whole genome/shotgun microbiome profiling is considered a higher-resolution alternative, but with decreased throughput and significantly increased sequencing costs and analysis burden. With both methods there are also challenges with microbial lysis, DNA preparation, and taxonomic analysis. Specialized microbiome-focused protocols were developed to achieve strain-level taxonomic differentiation using a rapid, high throughput rRNA gene assay. The protocol…
In this AGBT 2017 talk, PacBio CSO Jonas Korlach provided a technology roadmap for the Sequel System, including plans the continue performance and throughput increases through early 2019. Per SMRT Cell throughput of the Sequel System is expected to double this year and again next year. Together with a new higher-capacity SMRT Cell expected to be released by the end of 2018, these improvements result in a ~30-fold increase or ~150 Gb / SMRT Cell allowing a real $1000 real de novo human genome assembly. Also discussed: Additional application protocol improvements, new chemistry and software updates, and a look at…
At AGBT 2017, Lars Paulin from the University of Helsinki presented this poster on whole genome sequencing of the virus responsible for progressive multifocal leukoencephalopathy, a rare and dangerous brain infection. His team used long amplicon analysis to resolve the whole virus genome from three patient samples, pooled them for SMRT Sequencing, and identified variants and rearrangements. This work represents the first time the viral genome was sequenced from patients.
SMRT Sequencing is a DNA sequencing technology characterized by long read lengths and high consensus accuracy, regardless of the sequence complexity or GC content of the DNA sample. These characteristics can be harnessed to address medically relevant genes, mRNA transcripts, and other genomic features that are otherwise difficult or impossible to resolve. I will describe examples for such new clinical research in diverse areas, including full-length gene sequencing with allelic haplotype phasing, gene/pseudogene discrimination, sequencing extreme DNA contexts, high-resolution pharmacogenomics, biomarker discovery, structural variant resolution, full-length mRNA isoform cataloging, and direct methylation detection.
Targeted sequencing experiments commonly rely on either PCR or hybrid capture to enrich for targets of interest. When using short read sequencing platforms, these amplicons or fragments are frequently targeted to a few hundred base pairs to accommodate the read lengths of the platform. Given PacBio’s long readlength, it is straightforward to sequence amplicons or captured fragments that are multiple kilobases in length. These long sequences are useful for easily visualizing variants that include SNPs, CNVs and other structural variants, often without assembly. We will review methods for the sequencing of long amplicons and provide examples using amplicons that range…