At the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai in New York City, technology development expert Robert Sebra, Ph.D., sees tremendous need for long-read, high-accuracy sequencing for use in microbial surveillance, detection of repeat expansions, and other research applications. To meet that demand, he relies on Single Molecule, Real-Time (SMRT®) Sequencing from Pacific Biosciences with BluePippin™ automated DNA size selection from Sage Science. Together, these tools offer a powerful solution and industry-leading read lengths that allow Sebra and other researchers to resolve repeat elements and structural variants, rapidly close microbial genomes, and measure epigenetic marks.
Sebra, an assistant professor of genetic and genomic sciences, is no stranger to the SMRT Sequencing platform: he spent five years working at PacBio helping to develop that technology. Ultimately, his belief in the system led him to join the Icahn Institute, where he would get to use the PacBio® sequencer as a customer. Sebra, who came to Mount Sinai in 2012, says, “I had experienced firsthand the value of long-read sequencing and wanted to apply it to human and infectious disease research.”
Indeed, the institute has churned through some 1,800 SMRT Cells in the past year and shows no signs of slowing down. “I can’t emphasize enough the tremendous potential that I see for long-read sequencing in tackling hard-to-sequence samples in the clinical arena. The technology has led to novel results creating a rapid growth of interest as data become more accessible,” Sebra says.
As research and clinical projects come his way, Sebra must first ascertain which sequencing platform is the best fit. The PacBio RS II is his go-to system for epigenetic profiling, finishing microbial genomes, and exploring DNA samples likely to have repeats, large structural rearrangements, or ones that require allelic or accessory genome phasing.
Microbial genomes in particular are a sweet spot for SMRT technology, Sebra says. “Those are easy projects because we can sequence the epigenome and finish the entire genomic assembly in a few days while maintaining a low cost.” That genome-plus-epigenome capability is increasing the demand for PacBio sequencing even further, because no other platform offers the ability to look at genome-wide methylation and other base modifications.
As he applies long-read sequencing to projects where it will make the biggest impact, Sebra continually looks for ways to generate the longest possible reads. One technology that complements the PacBio workflow is the BluePippin automated DNA size-selection platform from Sage Science. Removing smaller fragments from the sequencing library ensures that the PacBio platform focuses on the longest fragments, so accurate sizing can improve average read length considerably. “You could do a traditional pulsed-field gel every time you’re trying to size select, but it takes too much time, doesn’t scale well, and the DNA input requirement is really high,” Sebra says. “The BluePippin solution is fast and cheap, and it’s the only option for size selecting in a high-throughput fashion. We purchased one as soon as it was available.”
Since bringing in the BluePippin system in 2012, Sebra’s team has run more than 100 libraries using the BluePippin+PacBio combo — in fact, he says, “For projects requiring near-finished genome assembly, I don’t think we’ve prepared a library without BluePippin size selection since owning the instrument.” He has been pleased with the amount of size-selected library the technology yields, noting that in virtually every experiment it produces more than enough to sequence a microbial genome to completion on the PacBio RS II. He generally excludes all fragments smaller than 10 kb to target the ultra-long fragments, but says that in cases where input DNA is especially low or the genome is quite large and requires more library, he lowers that threshold to 7 kb.
Learn how Sebra’s use of this pipeline worked in two specific projects in the full case study.