pbfusion is a new software tool for detecting gene fusions and other transcriptional abnormalities in PacBio Iso-Seq data.
The advancement of cancer research depends on the ability to accurately detect the molecular changes driving the initiation, progression, and evolution of tumors. In particular, gene fusions can lead to transcriptional dysregulation that drive the development of tumors. For example, one study found up to 50% of prostate cancers contain recurrent gene fusions. Another study found gene fusions have also been noted in roughly one third of soft tissue tumors including Ewing’s sarcoma – a form of cancer that, while rare, disproportionally affects children and adolescents.
So far, researchers have mostly relied on short-read sequencing technology to identify fusion genes. But the limitation in read length means that many fusions can be missed, especially in repeat-rich regions of the genome as found in this paper. Additionally, even when fusions are detected by short reads, they can only reveal gene-level information. Most human genes go through alternative splicing to create different transcript isoforms that impact the functional protein. With short reads, you cannot reveal full-length transcript structures of fusions.
The long-read sequencing solution
Accurate and sensitive detection of fusion genes and their transcript structures is needed to interpret functional consequences, understand tumor biology and evolution, and identify potential therapeutic targets. PacBio full-length RNA isoform sequencing (Iso-Seq method) resolves complex fusions, providing more accurate breakpoints, and a complete sequence readout of associated fusion transcripts. And while several published studies show how long-read sequencing is more effective at discovering gene fusions accurately, researchers need a fusion calling tool to help them quantify and annotate fusions found in their Iso-Seq data.
In a collaboration between University of Calgary and PacBio, a new fusion gene detection software has been created – pbfusion. The pbfusion caller is specifically designed for long, accurate HiFi reads from PacBio Iso-Seq data; the tool applies to both bulk and single-cell Iso-Seq data, including those generated using the MAS-Seq for 10x Single Cell 3’ kit. Given mapped Iso-Seq reads and a reference annotation, pbfusion flags reads that span two or more genes as putative gene fusion breakpoints, which are then clustered into fusion calls. Other transcriptional oddities like readthrough, strand swap, and novel exons are also annotated.
PacBio scientists applied pbfusion to twelve sarcoma samples including synovial, ASPS (primary & metastasis), angiosarcoma (biopsy, primary, recurrence), UPS (untreated, primary/neoadjuvant treated). In total, pbfusion discovered 23 known and 99 novel fusions (Figure 1). One notable discovered fusion is ASPSCR1-TFE3, a known marker of sarcomas, and appears in 0.05% of all cancers found in the AACR genomic GENIE database.
The new software tool is free and available on Bioconda, with documentation on GitHub.
To learn more about HiFi sequencing and solutions for cancer research, connect with a PacBio scientist.
Interested in learning more about long-read sequencing and the ways it can be applied to cancer research?
Single-cell RNA sequencing