Menu
September 27, 2016  |  General

Sequel System Data Release: Arabidopsis Dataset and Genome Assembly

Today we are pleased to release the first Arabidopsis thaliana (Ler-0) dataset and de novo genome assembly generated with the Sequel System, using two SMRT Cells and 12 hours of runtime. Only three years ago, we released our first genome assembly1 for Arabidopsis produced on the PacBio RS II using P4-C2 chemistry, 85 SMRT Cells and 255 hours of runtime. Four months later, we released a second Arabidopsis dataset1 using the improved P5-C3 chemistry, which reduced the number of SMRT Cells to 46 and runtime to 138 hours.

We produced this Sequel dataset using our latest chemistry enhancements which significantly reduce the amount of DNA required. Prior to these chemistry improvements, the amount of DNA needed to run many large genome projects on the Sequel System was prohibitive. These modifications enable the use of loading concentrations equivalent to PacBio RS II levels.

Details of the Library Protocol, Data Generation, and Assembly Process
Purified Arabidopsis (Ler-0) genomic DNA was sheared to an average size of 32 kb and converted to SMRTbell templates, followed by a 20 kb size selection performed on a BluePippin system (Sage Science). Each SMRT Cell was loaded at an on-plate concentration of 144 pM of library and run for 6 hours on the Sequel System using the modified chemistry. Collectively, the two SMRT Cells produced 10.8 Gb of data, contained in 1.1 million reads, with half of the data in reads greater than 16,400 bp in length. The data were assembled with HGAP4 in SMRT Link.

Results of Sequel System Arabidopsis genome assembly
Sequencing metrics:

PacBio RS II
P4-C2 Chemistry
PacBio RS II
P5-C3 Chemistry
Sequel System
Release date Sept 2013 Jan 2014 Sept 2016
Number of SMRT Cells 85 46 2
Run Time (hrs) 255 138 12
Number of Bases (Gb) 11.0 15.9 10.8
Number of Reads (M) 4.25 2.30 1.14
Read Length N50 (bp) 7,700 11,900 16,400

 
Assembly statistics:

PacBio RS II
P4-C2 Chemistry
PacBio RS II
P5-C3 Chemistry
Sequel System
Release date Sept 2013 Jan 2014 Sept 2016
Assembly Size (Mb) 121.7 124.5 122.9
Polished Contigs 331 239 238
Contig N50 (Mb) 6.2 6.7 10.4
Max Contig Length (Mb) 13.0 13.2 15.0

 
The raw and assembled data is publicly available for download.
De novo assembly of an Arabidopsis genome with SMRT Sequencing is not as groundbreaking as it was three years ago. However, this model organism data release demonstrates that, with these latest improvements, the Sequel System allows for the routine generation of high-quality assemblies of large, complex eukaryotic genomes. The modified chemistry is currently in testing and will be made available broadly once testing completes.
References:

  1. Kim, K. E. et al. (2014) Long-read, whole-genome shotgun sequence data for five model organismsScientific Data. 1, 140045.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.