June 1, 2021  |  

Evaluating the potential of new sequencing technologies for genotyping and variation discovery in human data.

A first look at Pacific Biosciences RS data Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome these limitations by providing significantly longer reads (now averaging >1kb), enabling more unique seeds for reference alignment. In addition, the lack of amplification in the library construction step avoids a common source of base composition bias. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical resequencing projects by assessing the quality of the raw sequencing data, as well as its use for SNP discovery and genotyping using the Genome Analysis Toolkit (GATK).


June 1, 2021  |  

SMRT Sequencing of whole mitochondrial genomes and its utility in association studies of metabolic disease.

In this study we demonstrate the utility of Single-Molecule Real Time SMRT sequencing to detect variants and to recapitulate whole mitochondrial genomes in an association study of Metabolic syndrome using samples from a well-studied cohort from Micronesia. The Micronesian island of Kosrae is a rare genetic isolate that offers significant advantages for genetic studies of human disease. Kosrae suffers from one of the highest rates of MetS (41%), obesity (52%), and diabetes (17%) globally and has a homogeneous environment making this an excellent population in which to study these significant health problems. We are conducting family-based association analyses aimed at identifying specific mitochondrial variants that contribute to obesity and other co-morbid conditions. We sequenced whole mitochondrial genomes from 10 Kosraen individuals who represent greater than 25 % of the mitochondrial genetic diversity for the entire Kosraen population. Using Pacific Biosciences C2 chemistry, SMRTbell libraries were constructed from pooled, full-length, unsheared 5 kb PCR amplicons, tiling the entire 16.6 kb mtDNA genome. Average read lengths for each sample were between 2500-3000 bp, with 5% of reads between 6,000-8,000 bases, depending on movie lengths. The data generated in this study serve as proof of principle that SMRT Sequencing data can be utilized for identification of high-quality variants and complete mitochondrial genome sequences. These data will be leveraged to identify causative variants for Metabolic syndrome and associated disorders.


June 1, 2021  |  

Single Molecule Real-Time (SMRT) Sequencing of genes implicated in autosomal recessive diseases.

In today’s clinical diagnostic laboratories, the detection of the disease causing mutations is either done through genotyping or Sanger sequencing. Whether done singly or in a multiplex assay, genotyping works only if the exact molecular change is known. Sanger sequencing is the gold standard method that captures both known and novel molecular changes in the disease gene of interest. Most clinical Sanger sequencing assays involve PCR-amplifying the coding sequences of the disease target gene followed by bi-directional sequencing of the amplified products. Therefore for every patient sample, one generates multiple amplicons singly and each amplicon leads to two separate sequencing reactions. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to Sanger sequencing including long read lengths, first-in-first-out processing, fast time to result, high-levels of multiplexing and substantially reduced costs. For our first proof-of-concept experiment, we queried 3 known disease-associated mutations in de-identified clinical samples. We started off with 3 autosomal recessive diseases found at an increased frequency in the Ashkenazi Jewish population: Tay Sachs disease, Niemann-Pick disease and Canavan disease. The mutated gene in Tays Sachs is HEXA, Niemann-Pick is SMPD1 and Canavan is ASPA. Coding exons were amplified in multiple (6-13) amplicons for each gene from both non-carrier and carriers. Amplicons were purified, concentrations normalized, and combined prior to SMRTbell™ Library prep. A single SMRTbell library was sequenced for each gene from each patient using standard Pacific Biosciences C2 chemistry and protocols. Average read lengths of 4,000 bp across samples allowed for high-quality Circular Consensus Sequences (CCS) across all amplicons (all less than 1 kb). This high quality CCS data permitted the clean partitioning of reads from a patient in the presence of heterozygous events. Using non-carrier sequencing as a control, we were able to correctly identify the known events in carrier genes. This suggests the potential utility of SMRT sequencing in a clinical setting, enabling a cost-effective method of replacing targeted mutation detection with sequencing of the entire gene.


June 1, 2021  |  

Rapid sequencing of HIV-1 genomes as single molecules from simple and complex samples.

Background: To better understand the relationships among HIV-1 viruses in linked transmission pairs, we sequenced several samples representing HIV transmission pairs from the Zambia Emory HIV Research Project (Lusaka, Zambia) using Single Molecule, Real-Time (SMRT) Sequencing. Methods: Single molecules were sequenced as full-length (9.6 kb) amplicons directly from PCR products without shearing. This resulted in multiple, fully-phased, complete HIV-1 genomes for each patient. We examined Single Genome Amplification (SGA) prepped samples, as well as samples containing complex mixtures of genomes. We detail mathematical techniques used in viral variant subspecies identification, including clustering distance metrics and mutual information, which were used to derive multiple de novo full-length genome sequences for each patient. Whole genome consensus estimates for each sample were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV-1 genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. Results: SMRT Sequencing data contained multiple full-length (>9 kb) continuous reads for each sample. Simple whole-genome consensus estimates easily identified transmission pairs. Clustering of genome reads showed diversity differences between samples, allowing characterization of the quasi-species diversity comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

A comparison of 454 GS FLX Ti and PacBio RS in the context of characterizing HIV-1 intra-host diversity.

PacBio 2013 User Group Meeting Presentation Slides: Lance Hepler from UC San Diego’s Center for AIDS Research used the PacBio RS to study intra-host diversity in HIV-1. He compared PacBio’s performance to that of 454® sequencer, the platform he and his team previously used. Hepler noted that in general, there was strong agreement between the platforms; where results differed, he said that PacBio data had significantly better reproducibility and accuracy. “PacBio does not suffer from local coverage loss post-processing, whereas 454 has homopolymer problems,” he noted. Hepler said they are moving away from using 454 in favor of the PacBio system.


June 1, 2021  |  

Allele-level sequencing and phasing of full-length HLA class I and II genes using SMRT Sequencing technology

The three classes of genes that comprise the MHC gene family are actively involved in determining donor-recipient compatibility for organ transplant, as well as susceptibility to autoimmune diseases via cross-reacting immunization. Specifically, Class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DQ and -DP are considered medically important for genetic analysis to determine histocompatibility. They are highly polymorphic and have thousands of alleles implicated in disease resistance and susceptibility. The importance of full-length HLA gene sequencing for genotyping, detection of null alleles, and phasing is now widely acknowledged. While DNA-sequencing-based HLA genotyping has become routine, only 7% of the HLA genes have been characterized by allele-level sequencing, while 93% are still defined by partial sequences. The gold-standard Sanger sequencing technology is being quickly replaced by second-generation, high- throughput sequencing methods due to its inability to generate unambiguous phased reads from heterozygous alleles. However, although these short, high-throughput, clonal sequencing methods are better at heterozygous allele detection, they are inadequate at generating full-length haploid gene sequences. Thus, full-length gene sequencing from an enhancer-promoter region to a 3’UTR that includes phasing information without the need for imputation still remains a technological challenge. The best way to overcome these challenges is to sequence these genes with a technology that is clonal in nature and has the longest possible read lengths. We have employed Single Molecule Real-Time (SMRT) sequencing technology from Pacific Biosciences for sequencing full-length HLA class I and II genes.


June 1, 2021  |  

Characterization of NNRTI mutations in HIV-1 RT using Single Molecule, Real-Time SMRT Sequencing.

Background: Genotypic testing of chronic viral infections is an important part of patient therapy and requires assays capable of detecting the entire spectrum of viral mutations. Single Molecule, Real-Time (SMRT) sequencing offers several advantages to other sequencing technologies, including superior resolution of mixed populations and long read lengths capable of spanning entire viral protein coding regions. We examined detection sensitivity of SMRT sequencing using a mixture of HIV-1 RT gene coding regions containing single NNRTI mutations. Methodology: SMRTbell templates were prepared from PCR products generated from a prospective reference material being developed by BC Center of Excellence for HIV/AIDS, and contained a mixture of fifteen infectious viruses containing single NNRTI resistance mutations (viz V90I, K101E, K103N, V108I, E138A/G/K/Q, V179D, Y181C, Y188C, G190A/S, M230L and P236L) built upon the HIV-1LAI molecular clone. Templates were sequenced on the PacBio RS II to obtain single molecule long reads using P4/C2 chemistry, using 180 minute movie collection without stage start. The relative abundances of the mutant viruses were then estimated using codon-aware analysis methods. Results: Sequencing of these templates produced average read lengths of 5.0 KB, comprising 40,000-fold coverage across the entire amplicon per SMRT Cell. All the expected mutations in the mixture of mutant viruses were accurately identified. Frequencies of NNRTI variants estimated ranged from 0.5% to 12.5%. Conclusions: Codon analysis revealed a number of variants across the amplicon with highly consistent results across SMRT Cells. From a single SMRT Cell, variants were accurately and reliably detected down to 0.5% with simple analyses. Long polymerase reads and high accuracy reads make it possible to call variants from just a few molecules. SMRT Sequencing can identify species comprising a mixed viral population, with granularity and low cost of consumables allowing for smaller multiplexing of samples and first-in-first-out processing.


June 1, 2021  |  

Next generation sequencing of full-length HIV-1 env during primary infection.

Background: The use of next generation sequencing (NGS) to examine circulating HIV env variants has been limited due to env’s length (2.6 kb), extensive indel polymorphism, GC deficiency, and long homopolymeric regions. We developed and standardized protocols for isolation, RT-PCR amplification, single molecule real-time (SMRT) sequencing, and haplotype analysis of circulating HIV-1 env variants to evaluate viral diversity in primary infection. Methodology: HIV RNA was extracted from 7 blood plasma samples (1 mL) collected from 5 subjects (one individual sampled and sequenced at 3 time points) in the San Diego Primary Infection Cohort between 3-33 months from their estimated date of infection (EDI). Median viral load per sample was 50,118 HIV RNA copies/mL (range: 22,387-446,683). Full-length (3.2 kb) env amplicons were constructed into SMRTbell templates without shearing, and sequenced on the PacBio RS II using P4/C2 chemistry and 180 minute movie collection without stage start. To examine viral diversity in each sample, we determined haplotypes by clustering circular consensus sequences (CCS), and reconstructing a cluster consensus sequence using a partial order alignment approach. We measured sample diversity both as the mean pairwise distance among reads, and the fraction of reads containing indel polymorphisms. Results: We collected a median of 8,775 CCS reads per SMRT Cell (range: 4243-12234). A median of 7 haplotypes per subject (range: 1-55) were inferred at baseline. For the one subject with longitudinal samples analyzed, we observed an increasing number of distinct haplotypes (8 to 55 haplotypes over the course of 30 months), and an increasing mean pairwise distance among reads (from 0.8% to 1.6%, Tamura-Nei 93). We also observed significant indel polymorphism, with 16% of reads from one sample later in infection (33 months post-EDI) exhibiting deletions of more than 10% of env with respect to the reference strain, HXB2. Conclusions: This study developed a standardized NGS procedure (PacBio SMRT) to deep sequence full-length HIV RNA env variants from the circulating viral population, achieving good coverage, confirming low env diversity during primary infection that increased over time, and revealing significant indel polymorphism that highlights structural variation as important to env evolution. The long, accurate reads greatly simplified downstream bioinformatics analyses, especially haplotype phasing, increasing our confidence in the results. The sequencing methodology and analysis tools developed here could be successfully applied to any area for which full-length HIV env analysis would be useful.


June 1, 2021  |  

High-throughput analysis of full-length proviral HIV-1 genomes from PBMCs.

Background: HIV-1 proviruses in peripheral blood mononuclear cells (PBMCs) are felt to be an important reservoir of HIV-1 infection. Given that this pool represents an archival library, it can be used to study virus evolution and CD4+ T cell survival. Accurate study of this pool is burdened by difficulties encountered in sequencing a full-length proviral genome, typically accomplished by assembling overlapping pieces and imputing the full genome. Methodology: Cryopreserved PBMCs collected from a total of 8 HIV+ patients from 1997-2001 were used for genomic DNA extraction. Patients had been receiving cART for 2-8 years at the time samples were obtained. 7 patients had pVL >50 copies/mL (mean: 312,282, range: 18,372-683,400) and 1 had pVL <50. Genomic DNA was subjected to limiting dilution prior to amplification of near-full-length genomes by a newly developed nested PCR. The predicted size of the PCR product was 9.0 kb, spanning from the 5’ LTR through the 3’ LTR. Single molecules were sequenced as near-full-length amplicons directly from PCR products without shearing using commercially available P4-C2 reagents and standard protocols on a PacBio RS II instrument. Quality of the genomes was validated by clonal positive controls and synthetic mixtures. Results: Near-full-length provirus genome sequences were successfully obtained from all 8 patients as continuous long reads from single molecules. PacBio sequencing required approximately 10% of the PCR product needed for Sanger sequencing and generated 325 MB per 3-hour run including 1,800 full-length intact genome reads on average. One patient’s sample was not at a limiting dilution and analysis revealed multiple subspecies. For 8 near-fulllength provirus genomes derived from the other 7 patients, large internal deletions were noted in 2 proviruses; APOBEC-mediated hypermutations were seen in 2 proviruses; and 4 proviruses appeared to be intact genomes. All of the defective proviruses showed a complete absence of resistance mutations in either RT or protease, even after 2-8 years of cART. On the contrary, all of the intact proviruses contained evidence of ART-resistance associated mutations suggesting that they represented relatively recent variants. Conclusions: Combining a novel protocol for full-length limiting dilution amplification of proviruses with PacBio SMRT sequencing allowed for the generation of near-full-length genomes with good quality and an ability to detect minor variants at the 1-10% level. Preliminary data analyses suggest that defective proviruses may represent archival variants that persist long-term in host cells, while intact proviruses within the PBMC pool showing evidence of active virus replication may represent more recent variants.


June 1, 2021  |  

Long-read, single-molecule applications for protein engineering.

The long read lengths of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases of sequence. This feature is particularly useful in the context of protein engineering, where large numbers of similar constructs are generated routinely to explore the effects of mutations on function and stability. We have developed a PCR-based barcoded sequencing method to generate high quality, full-length sequence data for batches of constructs generated in a common backbone. Individual barcodes are coupled to primers targeting a common region of the vector of interest. The amplified products are pooled into a single DNA library, and sequencing data are clustered by barcode to generate multi-molecule consensus sequences for each construct present in the pool. As a proof-of-concept dataset, we have generated a library of 384 randomly mutated variants of the Phi29 DNA polymerase, a 575 amino acid protein encoded by a 1.7 kb gene. These variants were amplified with a set of barcoded primers, and the resulting library was sequenced on a single SMRT Cell. The data produced sequences that were completely concordant with independent Sanger sequencing, for a 100% accurate reconstruction of the set of clones.


June 1, 2021  |  

Accurately surveying uncultured microbial species with SMRT Sequencing

Background: Microbial ecology is reshaping our understanding of the natural world by revealing the large phylogenetic and functional diversity of microbial life. However the vast majority of these microorganisms remain poorly understood, as most cultivated representatives belong to just four phylogenetic groups and more than half of all identified phyla remain uncultivated. Characterization of this microbial ‘dark matter’ will thus greatly benefit from new metagenomic methods for in situ analysis. For example, sensitive high throughput methods for the characterization of community composition and structure from the sequencing of conserved marker genes. Methods: Here we utilize Single Molecule Real-Time (SMRT) sequencing of full-length 16S rRNA amplicons to phylogenetically profile microbial communities to below the genus-level. We test this method on a mock community of known composition, as well as a previously studied microbial community from a lake known to predominantly contain poorly characterized phyla. These results are compared to traditional 16S tag sequencing from short-read technologies and subsets of the full-length data corresponding to the same regions of the 16S gene. Results: We explore the benefits of using full-length amplicons for estimating community structure and diversity. In addition, we investigate the possible effects of context-specific and GC-content biases known to affect short-read sequencing technologies on the predicted community structure. We characterize the potential benefits of profiling metagenomic communities with full-length 16S rRNA genes from SMRT sequencing relative to standard methods.


June 1, 2021  |  

Developments in PacBio metagenome sequencing: Shotgun whole genomes and full-length 16S.

The assembly of metagenomes is dramatically improved by the long read lengths of SMRT Sequencing. This is demonstrated in an experimental design to sequence a mock community from the Human Microbiome Project, and assemble the data using the hierarchical genome assembly process (HGAP) at Pacific Biosciences. Results of this analysis are promising, and display much improved contiguity in the assembly of the mock community as compared to publicly available short-read data sets and assemblies. Additionally, the use of base modification information to make further associations between contigs provides additional data to improve assemblies, and to distinguish between members within a microbial community. The epigenetic approach is a novel validation method unique to SMRT Sequencing. In addition to whole-genome shotgun sequencing, SMRT Sequencing also offers improved classification resolution and reliability of metagenomic and microbiome samples by the full-length sequencing of 16S rRNA (~1500 bases long). Microbial communities can be detected at the species level in some cases, rather than being limited to the genus taxonomic classification as constrained by short-read technologies. The performance of SMRT Sequencing for these metagenomic samples achieved >99% predicted concordance to reference sequences in cecum, soil, water, and mock control investigations for bacterial 16S. Community samples are estimated to contain from 2.3 and up to 15 times as many species with abundance levels as low as 0.05% compared to the identification of phyla groups.


June 1, 2021  |  

Long Amplicon Analysis: Highly accurate, full-length, phased, allele-resolved gene sequences from multiplexed SMRT Sequencing data.

The correct phasing of genetic variations is a key challenge for many applications of DNA sequencing. Allele-level resolution is strongly preferred for histocompatibility sequencing where recombined genes can exhibit different compatibilities than their parents. In other contexts, gene complementation can provide protection if deleterious mutations are found on only one allele of a gene. These problems are especially pronounced in immunological domains given the high levels of genetic diversity and recombination seen in regions like the Major Histocompatibility Complex. A new tool for analyzing Single Molecule, Real-Time (SMRT) Sequencing data – Long Amplicon Analysis (LAA) – can generate highly accurate, phased and full-length consensus sequences for multiple genes in a single sequencing run.


June 1, 2021  |  

Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing.

Colorectal cancer (CRC) represents one of the most prevalent and lethal malignant neoplasms and every individual of age 50 and above should undergo regular CRC screening. Currently, the most effective procedure to detect adenomas, the precursors to CRC, is colonoscopy, which reduces CRC incidence by 80%. However, it is an invasive approach that is unpleasant for the patient, expensive, and poses some risk of complications such as colon perforation. A non-invasive screening approach with detection rates comparable to those of colonoscopy has not yet been established. The current study applies Pacific Biosciences third generation, single molecule sequencing to the inspection of CRC-driving mutations. Our approach combines the screening power and the extremely high accuracy of circular consensus (CCS) third generation sequencing with the non-invasiveness of using stool DNA to detect CRC-associated mutations present at extremely low frequencies and establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and early stage adenomas. We performed a series of experiments using a pool of fifteen amplicons covering the genes most frequently mutated in CRC (APC, Beta Catenin, KRAS, BRAF, and TP53), ensuring a theoretical screening coverage of over 97% for both CRC and adenomas. The assay was able to detect mutations in DNA isolated from stool samples from patients diagnosed with CRC at frequencies below 0.5 % with no false positives. The mutations were then confirmed by sequencing DNA isolated from the excised tumor samples. Our assay should be sensitive enough to allow the early identification of adenomatous polyps using stool DNA as analyte. In conclusion, we have developed an assay to detect mutations in the genes associated with CRC and adenomas using Pacific Biosciences RS Single Molecule, Real Time Circular Consensus Sequencing (SMRT-CCS). With no systematic bias and a much higher raw base-calling quality (CCS) compared to other sequencing methods, the assay was able to detect mutations in stool DNA at frequencies below 0.5 % with no false positives. This level of sensitivity should be sufficient to allow the detection of most adenomatous polyps using stool DNA as analyte, a feature that would make our approach the first non-invasive assay with a sensitivity comparable to that of colonoscopy and a strong candidate for the non-invasive preventive CRC screening of the general population.


June 1, 2021  |  

Evaluation of multiplexing strategies for HLA genotyping using PacBio Sequencing technology.

Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facilitated by SMRT Sequencing technology. In the present work, we have evaluated multiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell adapters. Eight different 16-bp barcode sequences were used in symmetric & asymmetric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRT Cell. Amplicons generated from barcoded primers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 samples and also allowed generation of unique asymmetric pairings for simultaneous amplification from 28 reference genomic DNA samples. The data generated from all 3 methods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.