Scientists at the Human Genome Sequencing Center at
Baylor College of Medicine recently published a paper demonstrating the utility
of PacBio’s long reads for upgrading and finishing draft genomes.
“Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology,” published in PLoS One late last year from lead author Adam
English and senior author Richard Gibbs, details a method for improving draft
genomes, many of which have been assembled from short-read sequence data. In
addition, the Baylor team has developed its own algorithm called PBJelly to
automate the finishing process, optimized for long-read sequence data.
According to the scientists, this method replaces the traditional method of
finishing genomes with Sanger sequencing, a costly and time-intensive approach.
“Genome finishing has become a lost art due to the expense of oligonucleotide
directed Sanger sequencing relative to the low cost-per-base of second
generation sequencing technologies,” English et al. write.
The study was based on the premise that “extremely long and unbiased reads are
uniquely suited for upgrading genomes,” the authors report. PacBio’s long
reads, averaging 3,000 bases and often extending well past 7,000 bases,
provided a genome finishing opportunity not available with any other sequencing
technology.
In the four genome studies used for this paper, the Baylor team found
significant improvement to the genomes they tackled by adding the long reads
produced by the PacBio® RS to the assembly. For example, 24x mapped
coverage of PacBio reads was added to a draft of Drosophila pseudoobscura in a process that addressed 99 percent of
existing gaps in the genome; 69 percent were closed and 12 percent were
improved. Similarly, the scientists built on a preliminary assembly of the sooty
mangabey with 6.8x mapped coverage from the PacBio RS. In that case, they were able to address 97 percent of gaps,
closing 66 percent and improving 19 percent.
One of the key advantages of the PBJelly approach, the scientists note in their
paper, is the preservation of existing genome annotation. While some groups are
improving genome quality by re-assembling data — a step that loses existing
annotations — the Baylor team aimed to develop a tool that would keep that
information attached to the genome data. To that end, their PBJelly process of “genome
upgrading fills gaps and upgrades low quality regions, preserving most of the
assembled sequence and annotations,” according to the paper.
Worley, associate professor at the Baylor Human Genome Sequencing Center, in
the PacBio workshop at PAG tomorrow at 1:30. Reserve a seat or sign up here to get a recording of the presentations.
The PBJelly tool is publicly available.