CSHL Scientists Discuss Long-Read Sequencing for More Contiguous Assemblies and Complex Genomes
Thursday, March 24, 2016
Much like the “sharpen” tool in Photoshop brings a picture into tighter focus and enhances the fine detail, long-read sequencing offers enhanced resolution of genomic information, according to Cold Spring Harbor Laboratory colleagues Mike Schatz and Maria Nattestad. The scientists spoke with Mendelspod’s Theral Timpson about how long-read sequencing is advancing their research in unique and powerful ways; a brief recap of their conversation follows.
Schatz uses PacBio sequencing to establish incredibly accurate assemblies of microbial, crop, animal, and human genomes. Indeed, SMRT technology has significantly improved his work on the flatworm Macrostomum lignano, an organism with regenerative powers. With only a few reference genomes and limited functional studies available, the flatworm proved to be particularly challenging to sequence with short-read solutions. “We were quite frustrated by the results that we were getting, where the assembly was of very poor quality,” Schatz says. “It was also missing something like half of the genome that we expected to be there; it just wasn’t present at all in the assembly that took place.” At this point, the team realized that long reads would help them achieve a much improved reference genome. By collaborating with algorithm developers, PacBio, and the NIH, the team created an assembly that was about 100 times more contiguous than assemblies based on short-read data.
Long reads also appeal to Nattestad, who is using de novo assembly of the SK-BR-3 breast cancer cell line as a way to fully characterize not just SNPs, but also major structural variations. One of her interests in SK-BR-3 is to better understand Her2 oncogene amplification, and she has undertaken a historical, step-by-step reconstruction of its mutations using software she developed for that purpose. “Our focus here is not just to see how many copies of Her2 there are, or to see that it is Her2-amplified like you would in a diagnostic setting. Instead, we wanted to see how that amplification has happened over time in the genome, and try to reconstruct a history of steps that took place,” she says. Schatz notes that in SK-BR-3, the region around Her2 has undergone what they call ‘genome gymnastics,’ a very complicated series of amplifications, inverted duplications, and translocation events. He says that “trying to capture that level of complication and sophistication just from standard variant calling approaches is very challenging.” Nattestad plans to follow up with analyses of other oncogenes known to be amplified in this cell line.
This year, Schatz expects to see a number of reference-grade human genomes published using PacBio technology to create high-quality de novo assemblies. He says, “If you’re interested to do a de novo assembly of an entirely novel species, my strong recommendation — without any hesitation — is to do long-range PacBio sequencing, and I would advocate for 100x coverage of the longest reads you can possibly generate. … This will give you the most successful assembly.” Structural variation studies are similar, he says: “You really want to use the long-read technology in order to capture those structural variations as accurately as possible.”