New Look at Breast Cancer Cell Line Sheds Light on Structural Complexity
Thursday, September 6, 2018
In an exciting paper that made the cover of Genome Research, scientists from Cold Spring Harbor Laboratory and collaborating institutions report the genome sequence and transcriptome of a commonly used breast cancer cell line. They determined that the cell line harbors far more structural variants than previously thought with results that call into question cancer genome analysis based solely on short-read sequencing data.
In “Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line,” lead author Maria Nattestad, senior author Michael Schatz, and collaborators describe an in-depth investigation of SK-BR-3, an important model for HER2-positive breast cancer. “SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported,” they write, explaining their choice of PacBio long-read sequencing to conduct a new genomic and transcriptomic analysis of the cell line.
Investigating genomic instability is essential to understanding cancer but attempts to do so using short-read sequencing have seen limited success due to challenges in detecting structural variation. Even large-scale cancer projects “have performed somewhat limited analysis of structural variations, as both the false positive rate and the false negative rate for detecting structural variants from short reads are reported to be 50% or more,” Nattestad, et al. report. “Furthermore, the variations that are detected are rarely close enough to determine whether they occur in phase on the same molecule, limiting the analysis of how the overall chromosome structure has been altered.”
With the goal of creating a comprehensive map of structural variations in cancer, scientists sequenced the SK-BR-3 genome using SMRT Sequencing. To enable comparison between sequencing technologies, they also used a short-read technology. The team found that PacBio data was more mappable: more than 90% of PacBio reads align with a mapping quality of 60, while just 69% of short reads did the same. “We also observed a smaller GC bias in the PacBio sequencing compared to the Illumina sequence data,” they note, “which enables more robust copy number analysis and generally better variant detection overall.”
An analysis of variants showed that long-read sequencing detected more than 17,000 structural variants of at least 50 bp in length, while the short-read data yielded only about 4,100, a difference that could largely be attributed to the lack of insertions called in the latter data set. This closely mirrors the results of researchers working on population-specific reference genomes.
The scientists coupled their genomic variant discovery with the Iso-Seq method to capture full-length transcripts from SK-BR-3, noting that short-read data often cannot span or accurately reconstruct entire isoforms. “Long reads overcome such limitations by spanning multiple exon junctions and often covering complete transcripts,” they explain. Within the transcriptome analysis, the team closely examined several gene fusions. Some of the gene fusions were found to be the product of two or three rearrangement events occurring in sequence. For example, “CYTH1-EIF3H had been discovered previously with RNA-seq and been validated with RT-PCR, but it was not known to be a “2-hop” gene fusion (taking place through a series of two variants) until now,” the scientists report. “This fusion was also captured in full by several individual SMRT-seq reads that contain both variants and have alignments in both genes.” The authors also report finding direct evidence that a gene fusion previously thought to be the result of a 2-hop path is actually a 3-hop fusion.
One detailed illustration of the careful analysis performed for this project involved the ERBB2 oncogene, which is also called HER2. “We discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression,” the team writes. They were able to “reconstruct the progression of rearrangements resulting in the amplification of the ERBB2 oncogene, including a previously unrecognized inverted duplication spanning a large portion of the region.”
“Long-read read sequencing can expose complex variants with great certainty and context, suggesting that more multi-hop gene fusions, inverted duplications, and complex events may be found in other cancer genomes,” the scientists conclude. “There may be many other types of complex variations present in other cancer genomes that were not found in SK-BR-3, so it is essential to continue building a catalogue of these variant types using the best available technologies.”