By Jonas Korlach, Chief Scientific Officer
2013 was an eventful and exciting year for PacBio. As I described in the 2013 roadmap post a year ago, we have applied numerous improvements to SMRT® Sequencing, resulting in longer read lengths, greater sequencing throughput, new and improved data-analysis methods, and more efficient workflows. We are very pleased that these advances resulted in so many publications, conference presentations, and social media contributions, with the number of peer-reviewed scientific publications from the scientific community now exceeding 100. On behalf of all of us at Pacific Biosciences, I would like to express my heartfelt gratitude to the scientific community for their time and efforts to apply PacBio® sequencing to solve their research questions, and for their invaluable help to drive applications for SMRT Sequencing forward. We all very much look forward to working together with you in 2014!
As with any relatively new technology, significant improvement and optimization potential exists upon the initial introduction. SMRT Sequencing is no exception to this, and we intend to continue to leverage this potential to the benefit of the research community. Consistent with the technology improvements in previous years, we are targeting another ~4-fold increase in the throughput per SMRT Cell to achieve average read lengths greater than 10-15 kb and overall sequence data outputs in excess of 1 Gb per SMRT Cell, while at the same time preserving SMRT Sequencing’s high consensus accuracy, lack of sequencing bias, and ability to detect many epigenetic base modifications. The improvements will be accomplished by a combination of sequencing chemistry upgrades through polymerase and nucleotide engineering, improvements in the polymerase loading efficiency, and software upgrades.
In addition to the sequencing process itself, we will continue to develop improvements for the other two aspects relevant to sequencing. For library preparation, more streamlined protocols will become available, including automated library preparation methods on liquid handling robots. Further, we are developing improved protocols that better ensure the integrity of large inserts (10-20 kb) during the generation of high-quality, long-insert DNA libraries. In addition, protocols with a further reduction in the amount of DNA input, as well as improved barcoding and multiplexing solutions, will become available. With regard to data analysis, our ongoing progress to support and accelerate the analysis of larger genomes, including the human genome, will continue, with improvements to the speed of components such as our mapping tool BLASR and consensus caller Quiver. New methods achieving the assemblies and appropriate representation of organisms with diploid genomes will become available, thereby providing a significant advance in the genetic characterization of virtually all higher organisms, and their corresponding heterozygosity and structural genetic variation. Our Iso-Seq application for the analysis of full-length transcripts and splice isoforms will become more streamlined and include a graphical interface for greater ease of use.
We are indebted to the community for helping with the development of new sample preparation methods and analysis tools for these and many other application spaces, and we anticipate a continuation of these very important contributions. We will continue to release new data sets to the public as we have done in the past, e.g. the Arabidopsis de novo assembly, the long-read human genome dataset for structural variation, the MCF7 Iso-Seq dataset, bacterial methylomes, and the recent Drosophila de novo assembly, to provide the scientific community with examples of what value PacBio data bring to the characterization of the genome, epigenome, and transcriptome of the organism under study, and to help researchers design their own studies.
I am very excited about the prospects for this coming year, and wish you the best of success in your research!