X

Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.

X

Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.

IMAGES ARE PROVIDED BY Pacific Biosciences ON AN "AS-IS" BASIS. Pacific Biosciences DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, OWNERSHIP, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL Pacific Biosciences BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES OF ANY KIND WHATSOEVER WITH RESPECT TO THE IMAGES.

You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences' rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences
Contact:

Introduction of the Iso-Seq Method: State of the Art for Full-Length Transcriptome Sequencing

Thursday, May 3, 2018

Iguana

Photo by John Cobb on Unsplash

In eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms. Gene regulation through alternative splicing can dramatically increase the protein-coding potential of a genome. Therefore, understanding the functional biology of a genome requires knowing the full complement of isoforms. Microarrays and high-throughput cDNA sequencing are useful tools for studying transcriptomes, yet these technologies provide only small snippets of transcripts.  Accurately reconstructing complete transcripts to study gene isoforms has been challenging [1, 2].

The Iso-Seq method produces full-length transcripts using Single Molecule, Real-Time (SMRT) Sequencing [3]. Long read lengths enable sequencing of full-length transcripts up to 10 kb or longer, eliminating the need for transcript assembly or inferencing. The Iso-Seq bioinformatics pipeline, which is freely available through SMRT Analysis, further processes the data into high-quality consensus transcript sequences that enable accurate isoform annotation and open reading frame prediction [4].

Since it does not require a reference genome or existing annotation, the Iso-Seq method has been widely adopted by the scientific community to analyze a variety of important agricultural crops and animals such as coffee, cotton, maize, rabbit, chicken, and many others. In all cases, the researchers discovered a much more diverse and complex transcriptome than previously understood. For example, Kuo et al. expanded the chicken annotation to ~64,000 transcripts, of which ~21,000 were novel lncRNAs not annotated in Ensembl. In another case, Wang et al. were able to expand and correct the maize B73 genome annotation, including the discovery of 867 novel lncRNA transcripts.

The ability to unambiguously determine the full exonic structure of complex genes, with no assembly required, also makes the Iso-Seq method attractive to the study of human diseases. Kohli et al. were able to characterize androgen receptor (AR) isoforms in castration-resistant prostate cancer to show that one novel isoform, AR-V9, was co-expressed with AR-V7 and predictive of drug resistance. Tseng et al. discovered novel splice patterns in the FMR1 gene in premutation carriers for Fragile X-associated Tremor/Ataxia syndrome that were undetected in the control group.

Perhaps somewhat surprisingly, after the Iso-Seq dataset for the MCF-7 breast cancer cell line was released to the public [5], it was revealed that this well-studied sample contained more cancer fusion genes, two new mitochondrial lncRNAs and novel sample-specific transcripts. In a recently published study, Anvar et al. used this same deep MCF-7 dataset to show that there is widespread coupling of transcript features, where more than 7,000 genes were found to have preferential coupling of 5’ start sites, exons, and polyadenylation sites. Such a study would not have been possible without the ability to precisely determine the starts and ends, as well as the splice junctions, of each transcript isoform.

But the Iso-Seq method is not just limited to eukaryotes. Recently, a new protocol called SMRT-Cappable-seq was developed to sequence the E. coli transcriptome. The result is a dramatic increase in the number of annotated operons and readthrough for the bacterium. Similarly, the Iso-Seq method was used to discover new coding and anti-sense transcripts in the previously poorly annotated human cytomegalovirus.

Since the launch of the Iso-Seq protocol in SMRT Analysis in 2014, the analysis pipeline has seen several improvements. The new Iso-Seq2 protocol, released in SMRT Analysis 5.1 last month, improves both speed and transcript recovery [6]. More importantly, over the past 5 years the bioinformatics community has embraced the technology, sparking the development of additional tools. IsoCon, IDP, and IDP-denovo are error correction methods that work for targeted genes or hybrid data. Specialized long read aligners such as minimap2 now support alternative splicing. Cupcake and TAMA are two lightweight alignment processing tool suites. SQANTI categorizes Iso-Seq transcripts against an existing annotation and combines it with short read expression data. A growing list of community tools is maintained at the Iso-Seq wiki.

We encourage our users to continue finding new ways to utilize full-length transcript sequencing with PacBio and contribute to exciting biological discoveries!

 

Select Publications:

  1. Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. GigaScience 1–13 (2017). doi:10.1093/gigascience/gix086
  2. Wang, M. et al. A global survey of alternative splicing in allopolyploid cotton: landscape, complexity and regulation. New Phytol 217, 163–178 (2017).
  3. Wang, B. et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Comms 7, 11708 (2016).
  4. Chen, S.-Y., Deng, F., Jia, X., Li, C. & Lai, S.-J. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing. Sci. Rep. 7, 1–10 (2017).
  5. Kuo, R. I. et al. Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human. BMC Genomics 18, 1–19 (2017).
  6. Kohli, M. Androgen Receptor Variant AR-V9 Is Coexpressed with AR-V7 in Prostate Cancer Metastases and Predicts Abiraterone Resistance. Clin Cancer Res 23, 1–13 (2017).
  7. Tseng, E., Tang, H.-T., AlOlaby, R. R., Hickey, L. & Tassone, F. Altered expression of the FMR1 splicing variants landscape in premutation carriers. BBA – Gene Regulatory Mechanisms 1860, 1117–1126 (2017).
  8. Weirather, J. L. et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Research 43, e116–e116 (2015).
  9. Gao, S. et al. Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion 38, 41–47 (2018).
  10. Chakraborty, S. MCF-7 breast cancer cell line PacBio generated transcriptome has ~300 novel transcribed regions, un-annotated in both RefSeq and GENCODE, and absent in the liver, heart and brain transcriptomes. 1–8 (2017). doi:10.1101/100974
  11. Anvar, S. Y. et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19, 1–18 (2018).
  12. Yan, B., Boitano, M., Clark, T. & Ettwiller, L. SMRT-Cappable-seq reveals complex operon variants in bacteria. bioRxiv 1–34 (2018). doi:10.1101/262964
  13. Balazs, Z. et al. Long-Read Sequencing of Human Cytomegalovirus Transcriptome Reveals RNA Isoforms Carrying Distinct Coding Potentials. Sci. Rep. 1–9 (2017). doi:10.1038/s41598-017-16262-z

 

References and Resources:

[1] Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat Meth 10, 1177–1184 (2013).

[2] Angelini, C., Canditiis, D. & Feis, I. Computational approaches for isoform detection and estimation: good and bad news. BMC Bioinformatics 15, 135–43 (2014).

[3] Iso-Seq Template Preparation for Sequel Systems

[4] Gordon, S. P. et al. Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing. PLoS ONE 10, e0132628 (2015).

[5] PacBio MCF-7 blogpost: https://www.pacb.com/blog/data-release-human-mcf-7-transcriptome/

[6] PacBio Iso-Seq GitHub: https://github.com/PacificBiosciences/IsoSeq_SA3nUP/

Subscribe for blog updates:

Archives