+

X

Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.

X

Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.

IMAGES ARE PROVIDED BY Pacific Biosciences ON AN "AS-IS" BASIS. Pacific Biosciences DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, OWNERSHIP, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL Pacific Biosciences BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES OF ANY KIND WHATSOEVER WITH RESPECT TO THE IMAGES.

You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences’ rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences
Contact:

Toward Platinum Genomes: PacBio Releases a New, Higher-Quality CHM1 Assembly to NCBI

Friday, October 2, 2015

As part of our effort to support the National Institutes of Health and the Genome Reference Consortium (GRC) in creating platinum genomes for the research community and improving the reference genome, in 2014 we generated 54X SMRT® Sequencing coverage of the CHM1 cell line, derived from a human haploid hydatidiform mole, using our P5-C3 chemistry, and made it publicly available through the SRA database at NCBI.

The CHM1 dataset was quickly taken up by researchers eager to use long, unbiased reads to identify regions of the genome prone to structural variation and to fill in sequence gaps in the GRC-maintained human genome reference.  Mark Chaisson and Evan Eichler used PacBio® CHM1 data to resolve 26,079 euchromatic structural variants at the base-pair level, 85% of which were novel. Furthermore, they were able to close or extend 55% of the remaining gaps in GRCh37 [Chaisson et.al. (2015) Resolving the complexity of the human genome using single molecule sequencing. Nature. 517, 608-611].  At the Advances in Genome Biology and Technology (AGBT) 2015 GRC workshop, Karen Meltz Steinberg and Tina Graves-Lindsay from the McDonnell Genome Institute at Washington University presented the use of PacBio CHM1 data as part of GRC efforts to fill in gaps in GRCh38.  During her talk, Graves-Lindsay presented a high-level comparison of several assemblies of the PacBio CHM1 data using a number of newly developed long-read assembly tools, including MHAP by Adam Phillippy, Dazzler by Gene Myers, and Falcon by Jason Chin.

As PacBio CEO Mike Hunkapiller was listening to the talks, he realized that by upgrading the dataset, he could support not only the community’s effort to create a high-quality haploid human genome assembly and improve the reference, but also foster innovative genome assembly tools. Jason Chin notes, “Right now there are many approaches to whole genome assembly which are similar but have subtle differences. We need to evaluate what methods are the best for moving the field forward. Having a common dataset is useful to compare methods.” As it seemed the developer community had converged on this haploid cell line as a useful lingua franca for comparing different assembly pipelines, CHM1 data with the improved read length and accuracy of the newest P6-C4 chemistry would give bioinformaticians a new benchmarking opportunity, while advancing the goals of a platinum haploid genome assembly and resolving gaps and errors in the reference assembly.

Following up on Hunkapiller’s promise at AGBT, PacBio released a second CHM1 dataset to NCBI in September with ~60x coverage using P6-C4 chemistry. The dataset was generated with the new 30 kb sample prep protocol, and has a read length N50 of 19 kb. In the intervening months, several bioinformatics groups have been working with the new data and Chin has now uploaded his assembly results to NCBI to share with the community. The new assembly has a contig N50 of 26.9 Mb, with half of the genome contained within 30 contigs (contig L50). Regarding summary statistics, however, Chin emphasizes, “Genome assembly is a complex process, and no single statistic can sufficiently describe the results. Many different aspects of an assembly need to be evaluated to ensure high-quality results, including overall contiguity, completeness, the prevalence of mis-assemblies, and base-level accuracy. Releasing the whole assembly will allow all the experts within the community to fully understand the strengths and weaknesses of different approaches and determine how to move the field forward.”

Chin’s current assembly was created with Myers’ Daligner and the Falcon assembler developed at PacBio.

figure1

Figure 1.  Jason Chin’s new CHM1 assembly resolves the q arms of chromosomes 2 and 6 into very few contigs, with max contigs 107 Mbp and 109 Mbp long, respectively.

A highlight of the CHM1 assembly Chin submitted to NCBI is the near-complete assembly of the q arms of chromosome 6 in a contig 109 Mbp long. Another contig of 107 Mb spans more than two-thirds of the chromosome 2 q arm. Using the same publicly available dataset, Phillippy and Sergey Koren, now at NHGRI, are planning to submit their own CHM1 assembly to NCBI in the next month. This assembly will be generated using different assembly tools, namely the MHAP method developed by Konstantin Berlin and co-authors [Berlin, K et al. (2015) Assembling large genomes with single-molecule sequencing and locality sensitive hashing. Nature Biotech. 33,623-630], paired with Celera® Assembler. We think it will be very useful for the community to have access to both assemblies to comment on the strengths and weaknesses of the different approaches, or to compare these assemblies to their own efforts. These two submissions can be seen as part of a communal work in progress toward finding the best and most general approaches to large genome assembly. In addition, we hope other researchers will be able to use this dataset to further their own assembler development work.

There are multiple ways to learn more about all the work being done with the updated CHM1 data during ASHG.

  • Register to hear our workshop on Wednesday, October 7, from 1:00-2:30 PM EDT either in person or streaming, where Rick Wilson will highlight work he has done at the McDonnell Genome Institute developing high-quality references using both the CHM1 and CHM13 cell lines in a talk entitled “Of Reference Genomes and Precious Metals” (Sheraton Inner Harbor Hotel, Chesapeake Ballroom I/II/III, 3rd Floor).
  • Attend the GRC workshop ahead of ASHG on Tuesday, October 6, from 1:00-4:00 PM (Convention Center, Room 349, Level 3).
  • Attend the DNAnexus workshop on Thursday, October 8, from 1:00-2:30 PM (Convention Center, Room 345, Level 3), where Tina Graves-Lindsay will share her work combining PacBio and BioNano CHM1 and CHM13 data to generate assemblies with extremely high scaffold N50s.
  • See Karyn Meltz Steinberg give a talk during the Platinum Genomes session on Friday, October 9, at 2:15 PM (Convention Center, Room 316, Level 3) entitled “Building a Platinum Assembly From Single Haplotype Human Genomes Generated From Long Molecule Sequencing,” in which she will present work resolving regions of the genome associated with large, repetitive sequences and exhibiting complex allelic diversity.

To download all the CHM1 P6C4 raw data in compressed, archived, hdf5 format, click here.

To review and download individual run data, click here.

For Chin’s most recent CHM1 assembly contigs using the above data, click here.

Subscribe for blog updates:

Archives