June 1, 2021  |  

Complex alternative splicing patterns in hematopoietic cell subpopulations revealed by third-generation long reads.

Background: Alternative splicing expands the repertoire of gene functions and is a signature for different cell populations. Here we characterize the transcriptome of human bone marrow subpopulations including progenitor cells to understand their contribution to homeostasis and pathological conditions such as atherosclerosis and tumor metastasis. To obtain full-length transcript structures, we utilized long reads in addition to RNA-seq for estimating isoform diversity and abundance. Method: Freshly harvested, viable human bone marrow tissues were extracted from discarded harvesting equipment and separated into total bone marrow (total), lineage-negative (lin-) progenitor cells and differentiated cells (lin+) by magnetic bead sorting with antibodies to surface markers of hematopoietic cell lineages. Sequencing was done with SOLiD, Illumina HiSeq (100bp paired-end reads), and PacBio RS II (full-length cDNA library protocol for 1 – 6 kb libraries). Short reads were assembled using both Trinity for de novo assembly and Cufflinks for genome-guided assembly. Full-length transcript consensus sequences were obtained for the PacBio data using the RS_IsoSeq protocol from PacBios SMRTAnalysis software. Quantitation for each sample was done independently for each sequencing platform using Sailfish to obtain the TPM (transcripts per million) using k-mer matching. Results: PacBios long read sequencing technology is capable of sequencing full-length transcripts up to 10 kb and reveals heretofore-unseen isoform diversity and complexity within the hematopoietic cell populations. A comparison of sequencing depth and de novo transcript assembly with short read, second-generation sequencing reveals that, while short reads provide precision in determining portions of isoform structure and supporting larger 5 and 3 UTR regions, it fails in providing a complete structure especially when multiple isoforms are present at the same locus. Increased breadth of isoform complexity is revealed by long reads that permits further elaboration of full isoform diversity and specific isoform abundance within each separate cell population. Sorting the distribution of major and minor isoforms reveals a cell population-specific balance focused on distinct genome loci and shows how tissue specificity and diversity are modulated by alternative splicing.


June 1, 2021  |  

Targeted SMRT Sequencing and phasing using Roche NimbleGen’s SeqCap EZ enrichment

As a cost-effective alternative to whole genome human sequencing, targeted sequencing of specific regions, such as exomes or panels of relevant genes, has become increasingly common. These methods typically include direct PCR amplification of the genomic DNA of interest, or the capture of these targets via probe-based hybridization. Commonly, these approaches are designed to amplify or capture exonic regions and thereby result in amplicons or fragments that are a few hundred base pairs in length, a length that is well-addressed with short-read sequencing technologies. These approaches typically provide very good coverage and can identify SNPs in the targeted region, but are unable to haplotype these variants. Here we describe a targeted sequencing workflow that combines Roche NimbleGen’s SeqCap EZ enrichment technology with Pacific Biosciences’ SMRT Sequencing to provide a more comprehensive view of variants and haplotype information over multi-kilobase regions. While the SeqCap EZ technology is typically used to capture 200 bp fragments, we demonstrate that 6 kb fragments can also be utilized to enrich for long fragments that extend beyond the targeted capture site and well into (and often across) the flanking intronic regions. When combined with the long reads of SMRT Sequencing, multi-kilobase regions of the human genome can be phased and variants detected in exons, introns and intergenic regions.


June 1, 2021  |  

Whole genome sequencing and epigenome characterization of cancer cells using the PacBio platform.

The comprehensive characterization of cancer genomes and epigenomes for understanding drug resistance remains an important challenge in the field of oncology. For example, PC-9, a non-small cell lung cancer (NSCL) cell line, contains a deletion mutation in exon 19 (DelE746A750) of EGRF that renders it sensitive to erlotinib, an EGFR inhibitor. However, sustained treatment of these cells with erlotinib leads to drug-tolerant cell populations that grow in the presence of erlotinib. However, the resistant cells can be resensitized to erlotinib upon treatment with methyltransferase inhibitors, suggesting a role of epigenetic modification in development of drug resistance. We have characterized for the first time cancer genomes of both drug-sensitive and drug-resistant PC- 9 cells using long-read PacBio sequencing. The PacBio data allowed us to generate a high-quality, de novo assembly of this cancer genome, enabling the detection of forms of genomic variations at all size scales, including SNPs, structural variations, copy number alterations, gene fusions, and translocations. The data simultaneously provide a global view of epigenetic DNA modifications such as methylation. We will present findings on large-scale changes in the methylation status across the cancer genome as a function of drug sensitivity.


June 1, 2021  |  

Epigenome characterization of human genomes using the PacBio platform

In addition to the genome and transcriptome, epigenetic information is essential to understand biological processes and their regulation, and their misregulation underlying disease. Traditionally, epigenetic DNA modifications are detected using upfront sample preparation steps such as bisulfite conversion, followed by sequencing. Bisulfite sequencing has provided a wealth of knowledge about human epigenetics, however it does not access the entire genome due to limitations in read length and GC- bias of the sequencing technologies used. In contrast, Single Molecule, Real-Time (SMRT) DNA Sequencing is unique in that it can detect DNA base modifications as part of the sequencing process. It can thereby leverage the long read lengths and lack of GC bias for more comprehensive views of the human epigenome. I will highlight several examples of this capability towards the generation of new biological insights, including the resolution of methylation states in repetitive and GC-rich regions of the genome, and large-scale changes in the methylation status across a cancer genome as a function of drug sensitivity.


June 1, 2021  |  

Genome in a Bottle: You’ve sequenced. How well did you do?

Purpose: Clinical laboratories, research laboratories and technology developers all need DNA samples with reliably known genotypes in order to help validate and improve their methods. The Genome in a Bottle Consortium (genomeinabottle.org) has been developing Reference Materials with high-accuracy whole genome sequences to support these efforts.Methodology: Our pilot reference material is based on Coriell sample NA12878 and was released in May 2015 as NIST RM 8398 (tinyurl.com/giabpilot). To minimize bias and improve accuracy, 11 whole-genome and 3 exome data sets produced using 5 different technologies were integrated using a systematic arbitration method [1]. The Genome in a Bottle Analysis Group is adapting these methods and developing new methods to characterize 2 families, one Asian and one Ashkenazi Jewish from the Personal Genome Project, which are consented for public release of sequencing and phenotype data. We have generated a larger and even more diverse data set on these samples, including high-depth Illumina paired-end and mate-pair, Complete Genomics, and Ion Torrent short-read data, as well as Moleculo, 10X, Oxford Nanopore, PacBio, and BioNano Genomics long-read data. We are analyzing these data to provide an accurate assessment of not just small variants but also large structural variants (SVs) in both “easy” regions of the genome and in some “hard” repetitive regions. We have also made all of the input data sources publicly available for download, analysis, and publication.Results: Our arbitration method produced a reference data set of 2,787,291 single nucleotide variants (SNVs), 365,135 indels, 2744 SVs, and 2.2 billion homozygous reference calls for our pilot genome. We found that our call set is highly sensitive and specific in comparison to independent reference data sets. We have also generated preliminary assemblies and structural variant calls for the next 2 trios from long read data and are currently integrating and validating these.Discussion: We combined the strengths of each of our input datasets to develop a comprehensive and accurate benchmark call set. In the short time it has been available, over 20 published or submitted papers have used our data. Many challenges exist in comparing to our benchmark calls, and thus we have worked with the Global Alliance for Genomics and Health to develop standardized methods, performance metrics, and software to assist in its use.[1] Zook et al, Nat Biotech. 2014.


June 1, 2021  |  

SMRT Sequencing for the detection of low-frequency somatic variants

The sensitivity, speed, and reduced cost associated with Next-Generation Sequencing (NGS) technologies have made them indispensable for the molecular profiling of cancer samples. For effective use, it is critical that the NGS methods used are not only robust but can also accurately detect low frequency somatic mutations. Single Molecule, Real-Time (SMRT) Sequencing offers several advantages, including the ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach. The availability of genetically defined, human genomic reference standards provides an industry standard for the development and quality control of molecular assays. Here we characterize SMRT Sequencing for the detection of low-frequency somatic variants using the Quantitative Multiplex DNA Reference Standard from Horizon Diagnostics, combined with amplification of the variants using the Multiplicom Tumor Hotspot MASTR Plus assay. The Horizon Diagnostics reference sample contains precise allelic frequencies from 1% to 24.5% for major oncology targets verified using digital PCR. It recapitulates the complexity of tumor composition and serves as a well-characterized control. The control sample was amplified using the Multiplicom Tumor Hotspot Master Plus assay that targets 252 amplicons (121-254 bp) from 26 relevant cancer genes, which includes all 11 variants in the control sample. The amplicons were sequenced and analyzed using SMRT Sequencing to identify the variants and determine the observed frequency. The random error profile and high accuracy CCS reads make it possible to accurately detect low frequency somatic variants.


June 1, 2021  |  

Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing

Next-Generation Sequencing (NGS) technologies allow for molecular profiling of cancer samples with high sensitivity and speed at reduced cost. For efficient profiling of cancer samples, it is important that the NGS methods used are not only robust, but capable of accurately detecting low-frequency somatic mutations. Single Molecule, Real-Time (SMRT) Sequencing offers several advantages, including the ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach. The availability of genetically defined, human genomic reference standards provides an industry standard for the development and quality control of molecular assays for studying cancer variants. Here we characterize SMRT Sequencing for the detection of low-frequency somatic variants using the Quantitative Multiplex DNA Reference Standards from Horizon Discovery, combined with amplification of the variants using the Multiplicom Tumor Hotspot MASTR Plus assay. First, we sequenced a reference standard containing precise allelic frequencies from 1% to 24.5% for major oncology targets verified using digital PCR. This reference material recapitulates the complexity of tumor composition and serves as a well-characterized control. The control sample was amplified using the Multiplicom Tumor Hotspot MASTR Plus assay that targets 252 amplicons (121-254 bp) from 26 relevant cancer genes, which includes all 11 variants in the control sample. Next, we sequenced control samples prepared by SeraCare Life Sciences, which contained a defined mutation at allelic frequencies from 10% down to 0.1%. The wild type and mutant amplicons were serially diluted, sequenced and analyzed using SMRT Sequencing to identify the variants and determine the observed frequency. The random error profile and high-accuracy CCS reads make it possible to accurately detect low-frequency somatic variants.


June 1, 2021  |  

Candidate gene screening using long-read sequencing

We have developed several candidate gene screening applications for both Neuromuscular and Neurological disorders. The power behind these applications comes from the use of long-read sequencing. It allows us to access previously unresolvable and even unsequencable genomic regions. SMRT Sequencing offers uniform coverage, a lack of sequence context bias, and very high accuracy. In addition, it is also possible to directly detect epigenetic signatures and characterize full-length gene transcripts through assembly-free isoform sequencing. In addition to calling the bases, SMRT Sequencing uses the kinetic information from each nucleotide to distinguish between modified and native bases.


June 1, 2021  |  

Multiplex target enrichment using barcoded multi-kilobase fragments and probe-based capture technologies

Target enrichment capture methods allow scientists to rapidly interrogate important genomic regions of interest for variant discovery, including SNPs, gene isoforms, and structural variation. Custom targeted sequencing panels are important for characterizing heterogeneous, complex diseases and uncovering the genetic basis of inherited traits with more uniform coverage when compared to PCR-based strategies. With the increasing availability of high-quality reference genomes, customized gene panels are readily designed with high specificity to capture genomic regions of interest, thus enabling scientists to expand their research scope from a single individual to larger cohort studies or population-wide investigations. Coupled with PacBio® long-read sequencing, these technologies can capture 5 kb fragments of genomic DNA (gDNA), which are useful for interrogating intronic, exonic, and regulatory regions, characterizing complex structural variations, distinguishing between gene duplications and pseudogenes, and interpreting variant haplotyes. In addition, SMRT® Sequencing offers the lowest GC-bias and can sequence through repetitive regions. We demonstrate the additional insights possible by using in-depth long read capture sequencing for key immunology, drug metabolizing, and disease causing genes such as HLA, filaggrin, and cancer associated genes.


June 1, 2021  |  

SMRT Sequencing of full-length androgen receptor isoforms in prostate cancer reveals previously hidden drug resistant variants

Prostate cancer is the most frequently diagnosed male cancer. For prostate cancer that has progressed to an advanced or metastatic stage, androgen deprivation therapy (ADT) is the standard of care. ADT inhibits activity of the androgen receptor (AR), a master regulator transcription factor in normal and cancerous prostate cells. The major limitation of ADT is the development of castration-resistant prostate cancer (CRPC), which is almost invariably due to transcriptional re-activation of the AR. One mechanism of AR transcriptional re-activation is expression of AR-V7, a truncated, constitutively active AR variant (AR-V) arising from alternative AR pre-mRNA splicing. Noteworthy, AR-V7 is being developed as a predictive biomarker of primary resistance to androgen receptor (AR)-targeted therapies in CRPC. Multiple additional AR-V species are expressed in clinical CRPC, but the extent to which these may be co-expressed with AR-V7 or predict resistance is not known.


June 1, 2021  |  

The role of androgen receptor variant AR-V9 in prostate cancer

The expression of androgen receptor (AR) variants is a frequent, yet poorly-understood mechanism of clinical resistance to AR-targeted therapy for castration-resistant prostate cancer (CRPC). Among the multiple AR variants expressed in CRPC, AR-V7 is considered the most clinically-relevant AR variant due to broad expression in CRPC, correlations of AR-V7 expression with clinical resistance, and growth inhibition when AR-V7 is knocked down in CRPC models. Therefore, efforts are under way to develop strategies for monitoring and inhibiting AR-V7 in castration-resistant prostate cancer (CRPC). The aim of this study was to understand whether other AR variants are co-expressed with AR-V7 and promote resistance to AR-targeted therapies. To test this, we utilized RNA-seq to characterize AR expression in CRPC models. RNA-seq revealed the frequent coexpression of AR-V9 and AR-V7 in multiple CRPC models and metastases. Furthermore, long-read single-molecule real-time (SMRT) sequencing of AR isoforms revealed that AR-V7 and AR-V9 shared a common 3’terminal cryptic exon. To test this, we knocked down AR-V7 in prostate cancer cell lines and confirmed that AR-V9 mRNA and protein expression were also impacted. In reporter assays with AR-responsive promoters, AR-V9 functioned as a constitutive activator of androgen/AR signaling. Similarly, infection of AR-V9 lentiviral construct in LNCaP cells induced androgen-independent cell proliferation. In conclusion, these data implicate co-expression of AR-V9 with AR-V7 as an important component of constitutive AR signaling and therapeutic resistance in CRPC.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.