Sanger Scientists Test Sequel II System for Tree-of-Life Projects
Thursday, July 11, 2019
Today we offer the final post in our blog miniseries about early access users’ experiences with the new Sequel II System. Shane McCarthy, a scientist at the University of Cambridge who was able to use the new sequencing system at the Wellcome Sanger Institute, gave a presentation on his experience generating data for tree-of-life sequencing projects.
McCarthy participates in several of these large-scale projects, such as the Vertebrate Genomes Project, the Sanger 25 Genomes Project, and the Darwin Tree of Life Project. For all of them, the goal is to produce high-quality, phased, chromosome-level assemblies with minimal gaps.
Through Sanger’s early access to the Sequel II System, McCarthy and his team were able to evaluate the new sequencing system’s performance on several animal genomes. These included fish (brown trout, sterlet, ploughfish, and milkfish), amphibians (Gaboon caecilian and common frog), and others; most had been sequenced previously so there were existing genomic resources to use for comparison.
The genomes were assembled with a mix of continuous long read (CLR) data and HiFi data, the latter of which is produced via circular consensus sequencing (CCS). For the CLR sequencing mode, the new SMRT Cell 8M yield was 80 Gb to 90 Gb. In CCS mode, the cells often produced more than 250 Gb of raw data. “We were quite happy” with the yields, McCarthy said, noting that the system performed consistently.
After giving an overview of his work, McCarthy dove into detailed looks at two of the fish samples to help webinar attendees understand the Sequel II System’s performance. For the sterlet, which has a genome made more challenging due to an unresolved whole genome duplication that left some residual tetraploidy, his team used two SMRT Cells of CLR data for the assembly. They compared the results for this fish to previous assemblies of its parents, using trio binning to assign haplotypes to their maternal or paternal origin. A BUSCO analysis found that more than 92% of genes were complete in each haplotype, a level that McCarthy considers very good at this stage of the assembly. He also presented data on milkfish, which similarly led to strong results (at least 95% of genes were complete) from BUSCO analysis.
McCarthy noted that data from these projects are being made available through the VGP. As for the Sequel II System, he concluded, “it’s a huge leap in scaling and affordability for these tree-of-life genome assembly projects.”
For more details, watch McCarthy’s full presentation.