SMRT Data Analysis: Updates from our Developers Conference
Thursday, September 17, 2015
Last month we hosted a SMRT® Informatics Developers Conference, bringing together 150 developers with a passion for improving tools and resources. Our team came back brimming with enthusiasm for tools that will be released in the coming months, and humbled by the commitment we saw from the bioinformatics community to help scientists make SMRT Sequencing data increasingly useful. Thanks to the National Institute of Standards and Technology for hosting our meeting on their campus right before the Genome in a Bottle workshop.
The big news we shared with attendees is that the PacBio® System will now output industry-standard BAM files instead of our usual HDF5 format — check out the new specifications.
Our keynote presentation came from sequencing veteran Gene Myers of the Max-Planck Institute. He talked about building efficient assemblers, the importance of random error distribution in sequencing data, and resolving tricky repeats with very long reads. He also encouraged developers to release assembly modules openly, and noted that data should be straightforward to parse since sharing data interfaces is easier than sharing software interfaces.
Much of the day-long event was allocated to networking time — providing opportunities for developers to catch up, brainstorm, and exchange ideas. Breakout sessions covering different analysis applications allowed developers to provide updates on a number of tools due out before the end of the year, including ones for Structural Variant Detection, Iso-Seq™ analysis and epigenetic analysis. It’s exciting to see a real expansion in the suite of community-driven tools available for SMRT data.
What we heard most throughout the event was that we should do this more often, and we’ve taken that to heart. We’re hoping to hold developer conferences semiannually, and will keep you posted as plans take shape for the next one.
In the meantime, check out these resources from the meeting:
• SMRT Informatics Developers Conference – Kevin Corcoran, Senior Vice President, Market Development, Pacific Biosciences
• Making the Most of Long Reads – Gene Myers, Ph.D., Founding Director, Systems Biology Center, Max Planck Institute
• PacBio SMRT Analysis 3.0 Preview – David Alexander, Ph.D., Pacific Biosciences
• MinHash for Overlapping and Assembly – Sergey Koren, Ph.D., National Biodefense Analysis and Countermeasures Center
• The “Art” of Shotgun Sequencing – Jason Chin, Ph.D., Pacific Biosciences
• PBHoney: Detecting SVs with Long-Read Sequencing – Adam English, Ph.D., Baylor College of Medicine
• Structural Variation with PacBio Data – Ali Bashir, Ph.D., Mount Sinai School of Medicine
• The Iso-Seq™ Method: Transcriptome Sequencing Using Long Reads – Elizabeth Tseng, Ph.D., Pacific Biosciences
• CONVEX: De novo Transcriptome Error Correction by Convexification – David Tse, Ph.D., Stanford University
• Transcriptome Analysis using Hybrid-Seq – Kin Fai Au, Ph.D., University of Iowa
• Understanding Methylome, Metagenome, Structural Variants using SMRT Sequencing – Shinichi Morishita, Ph.D., University of Tokyo
• Storify link to see all tweets from the event (#SMRTBFX)
Also, the Google groups for the events are available to continue the conversation – these forums are open to anyone who wants to join:
• De Novo Assembly – SMRT_denovo
• Structural Variation – SMRT_sv
• Iso-Seq – SMRT– SMRT_IsoSeq (note, this is a change)
• Kinetics – SMRT_kinetics