Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.


Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.


You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences' rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences

Data Release: Alzheimer Brain Isoform Sequencing (Iso-Seq) Dataset

Tuesday, September 20, 2016

Alzheimer’s disease (AD) is a devastating neurodegenerative disease that affects ~44 million people worldwide, making it the most common form of dementia. Pathologically it is defined by severe neuronal loss, aggregation of amyloid β (Aβ) in extracellular senile plaques in the brain, and formation of intraneuronal neurofibrillary tangles consisting of hyperphosphorylated tau protein. Studies looking into disease mechanism have shown that changes in gene expression due to alternative splicing likely contribute to the initiation and progression of AD. Hence, efforts have been made to better understand the gene expression changes in the AD brain by sequencing the transcriptome of affected brain regions.

Most transcriptome studies conducted to date have used short-read sequencing technologies, which provide the abundance of transcript reads needed for evaluating expression profiles. However, the ability to accurately identify alternative splicing and the associated expression patterns for different splice isoforms is limited by the short read-lengths. Given that the average size of a human gene transcript is several kilobases long, a 150bp to 300bp read will fail to span the entire transcript and therefore assembly will be required. In most cases this process can be very difficult, if not impossible, given the high similarity between expressed gene isoforms.

Recent studies using the PacBio isoform sequencing (Iso-Seq) method demonstrated the advantages of obtaining high-quality, full-length transcript sequences for improving genome annotation [1], [2], identifying fusion cancer genes [3], and discovering novel alternative splicing patterns [4]. Here we apply the Iso-Seq method from an Alzheimer brain RNA sample. The purpose of releasing the dataset to the public is to provide researchers with a full-length transcriptome reference from which they can develop bioinformatics tools and validate their own findings.

In our final “confident” dataset, we obtained 21,742 high-quality, full-length isoforms covering 9,313 non-overlapping loci ranging from 352 bp – 9,457 bp, with an average length of 3,400 bp (Fig. 1). The total percentage of consensus bases that disagreed with the hg38 genome is 0.036% substitutions, 0.08% insertions, and 0.08% deletions, bringing the overall concordance with hg38 to 99.8%. More than half of the transcribed loci have one observed isoform, while most of the rest have about two to five isoforms (Fig. 2). When compared with the reference transcript annotation Gencode v25, more than a third of the isoforms match a reference transcript completely, while the majority of isoforms are possible novel splice forms of a known gene. In addition to the stringent “confident” dataset, we are also releasing a larger, less stringent “promiscuous” dataset. Details on the difference between the two versions can be found in the download section.


Library Preparation and Sequencing

An Alzheimer’s Disease Brain total RNA sample was purchased from BioChain. First strand cDNA library was generated using Clontech SMARTer cDNA synthesis kit followed by size selection using the SageELFTM device by Sage Science, with lanes combined to create five size libraries that roughly correspond to 1-2 kb, 2-3 kb, 3-5 kb, 5-7 kb, and > 7kb libraries. Sequencing was done using P6-C4 chemistry and 3-hr movies for the 1-2 kb fraction and 4-hr movies for the remaining fractions. Sequencing was completed in 2015. Download details on the sample preparation procedure.


Bioinformatics Analysis

The standard Iso-Seq pipeline (ToFU version 2.2.3 or equivalent to SMRT Analysis 3.1; for detailed methods see [5]) was used to process the data. Iso-Seq classify generated 1,107,889 FLNC reads and 1,929,319 nFL reads. The reads were then used to generate high-quality, full-length isoforms using ICE followed by Quiver polishing (HQ Quiver isoform consensus). By definition, an HQ Quiver consensus sequence must have at least two supporting full-length reads and predicted accuracy of >= 99%. The HQ Quiver consensus sequences were then aligned to human reference genome hg38 to create a final “confident” dataset of unique isoforms. To create the larger “promiscuous” dataset, additional consensus results that contained only one supporting full-length read were added. For details on the bioinformatics analysis, please see the README file on the Download Page.



Figure 1.  Length distribution of final, unique, full-length isoforms.

Number of isoforms: 21,742

Min-max length: 352 bp – 9457 bp

Average length: 3400 bp




Figure 2. Number of isoforms per loci. 21,742 isoforms were grouped into 9,313 non-overlapping strand-specific loci. The average number of isoforms per loci was 2.3.


We welcome researchers to download and use the dataset for their research. For citation of the dataset, please use:

The Alzheimer brain Iso-Seq dataset was generated by Pacific Biosciences, Menlo Park, California, and additional information about the sequencing and analysis is provided at https://downloads.pacbcloud.com/public/dataset/Alzheimer_IsoSeq_2016/. The data used in the present study was retrieved from PacBio’s online database at https://downloads.pacbcloud.com/public/dataset/Alzheimer_IsoSeq_2016/ (date of retrieval).



[1]      B. Wang, E. Tseng, M. Regulski, T. A. Clark, T. Hon, Y. Jiao, Z. Lu, A. Olson, J. C. Stein, and D. Ware, “Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing,” Nat Comms, vol. 7, p. 11708, Jun. 2016.

[2]      S. E. Abdel-Ghany, M. Hamilton, J. L. Jacobi, P. Ngam, N. Devitt, F. Schilkey, A. Ben-Hur, and A. S. N. Reddy, “A survey of the sorghum transcriptome using single-molecule long reads,” Nat Comms, vol. 7, p. 11706, Jun. 2016.

[3]      J. L. Weirather, P. T. Afshar, T. A. Clark, E. Tseng, L. S. Powers, J. G. Underwood, J. Zabner, J. Korlach, W. H. Wong, and K. F. Au, “Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing,” Nucleic Acids Research, vol. 43, no. 18, pp. e116–e116, Oct. 2015.

[4]      D. I. Pretto, J. S. Eid, C. M. Yrigollen, H.-T. Tang, E. W. Loomis, C. Raske, B. Durbin-Johnson, P. J. Hagerman, and F. Tassone, “Differential increases of specific FMR1mRNA isoforms in premutation carriers,” J Med Genet, vol. 52, no. 1, pp. 42–52, Dec. 2014.

[5]      S. P. Gordon, E. Tseng, A. Salamov, J. Zhang, X. Meng, Z. Zhao, D. Kang, J. Underwood, I. V. Grigoriev, M. Figueroa, J. S. Schilling, F. Chen, and Z. Wang, “Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing,” PLoS ONE, vol. 10, no. 7, p. e0132628, Jul. 2015.

Subscribe for blog updates: