X

Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.

X

Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.

IMAGES ARE PROVIDED BY Pacific Biosciences ON AN "AS-IS" BASIS. Pacific Biosciences DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, OWNERSHIP, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL Pacific Biosciences BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES OF ANY KIND WHATSOEVER WITH RESPECT TO THE IMAGES.

You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences' rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences
Contact:

New MHAP Algorithm Delivers Fast, High-Quality Genome Assemblies

Tuesday, May 26, 2015

A new publication in Nature Biotechnology reports the development of a lightning-fast genome assembly pipeline optimized for long reads. Scientists from the University of Maryland and the National Biodefense Analysis and Countermeasures Center created the MinHash Alignment Process, known as MHAP, to dramatically reduce assembly time and improve assembly quality. Their results are worth celebrating: assembly times were 600-fold faster compared to existing methods. “Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes,” the authors write. In the best cases, entire chromosome arms assembled into single-pieces from telomere to centromere!

MHAP takes a probabilistic approach to overlap-based assembly of long reads. MinHash represents longer text or a string of information as a set of fingerprints, allowing the assembly process to occur with more compact data that’s less computationally intensive. The authors’ MHAP overlapping method has been integrated into Celera Assembler for the assembly of gigabase-sized genomes, and is reported in their new paper “Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing.”

While the technical approach to MHAP is very clever, what impressed us most from this publication were the results. Lead authors Konstantin Berlin and Sergey Koren, along with their collaborators, tested the algorithm on five different genomes to gauge its performance. The organisms included Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, and a human cell line (CHM1). For E. coli, the assembly of 85x SMRT® Sequencing reads took 20 minutes on a 16-core desktop computer; for S. cerevisiae, assembly time was less than two hours and resulted in a four-fold improvement in N50 size compared to previous assemblies. “For E. coli, the total cost of PBcR-MHAP assembly and Quiver polishing is currently less than $2 [using cloud computing],” Berlin et al. write.

As genomes got larger, the speedup became more pronounced: the D. melanogaster assembly was 600 times faster than previous methods — from 629,000 CPU hours to just 1,086 CPU hours — and produced a contig N50 longer than the scaffold N50 of the Sanger reference assembly, and “hundreds of fold more contiguous” than a synthetic long-read assembly. For yeast, the results were even more impressive. The authors report that “a majority of the 16 chromosomes were completely resolved from telomere to telomere” in their MHAP long-read assembly of S. cerevisiae W303.

The team also assembled a human genome, using the publicly available PacBio® long-read CHM1 haploid human genome dataset. The assembly’s contig N50 is an order of magnitude larger than the original Sanger-based human assembly, the authors report, and may resolve more than 50 of the 800-plus annotated gaps in the latest human reference genome. Zooming in on the highly complex MHC region of the genome, the scientists found that 97 percent of the locus is represented in just two contigs, compared to more than 60 contigs covering the same region in a recently published short-read assembly of the same cell line.

With MHAP, the authors anticipate that pent-up demand for rapid, affordable, and high-quality assembly can now be met. “These results demonstrate that [PacBio] single-molecule sequencing alone can produce near-complete eukaryotic genomes,” they write. We’re certainly excited to see more reference-grade eukaryotic genome assemblies generated using the new MHAP method.

Subscribe for blog updates:

Archives