At Plant & Animal Genome Workshop, Users Showcase Projects Enabled by SMRT Sequencing
Wednesday, January 29, 2014
Earlier this month, we hosted a workshop at the International Plant & Animal Genome (PAG) conference in San Diego entitled “A SMRT® Sequencing Approach to Reference Genomes, Annotation, and Haplotyping.” PacBio users presented data on various projects that have benefited from long-read sequence data, including several that had previously been attempted with short-read technologies without success. We were delighted to see reports on newer features of SMRT Sequencing, including full-length isoforms, automated haplotyping, and more. Here’s a recap, as well as links to video recordings of the presentations:
Chongyuan Luo, a scientist from Joe Ecker’s lab at the Salk Institute for Biological Studies, offered a presentation on genomic and epigenetic variations across model organism Arabidopsis thaliana. He used SMRT Sequencing to resolve three strains of the plant, sequencing each to more than 50x coverage. Compared to short-read sequence data, PacBio® data correctly identified more than 200,000 SNPs previously missed in each strain; most were enriched in the peri-centromere region. Because of that, Luo recommends using only PacBio data for a genome assembly. His team also achieved their goal of detecting structural variants that have been underrepresented by genome assemblies from short-read data.
Watch recording: Resolving the Complexity of Genomic and Epigenomic Variations in Arabidopsis
Shane Brubaker, bioinformatics director at Solazyme, Inc., talked about the need for a high-quality reference genome for a strain of algae that his company uses to produce renewable oil. The company first tried short-read sequence data, but couldn’t get through the GC-rich genome. Using PacBio sequencing, the team not only fully sequenced the genome — assembling it into just a few contigs per chromosome that even included centromere sequence — but also built a tool to perform automated haplotyping and later conducted allele-specific expression analysis. The final assembly accurately represented the diploid genome, Brubaker said, noting that CCS reads alone exceeded Sanger quality at far lower cost. “You can now get a reference assembly that is essentially finished quality without doing all those gap-closing steps,” he said. Watch recording: Assembly, Haplotyping, and Annotation of a High GC Algal Genome
Allen Van Deynze, director of research at the University of California, Davis, Seed Biotechnology Center, spoke about a spinach genome sequencing project. The plant is important in its own right, but sequencing became more urgent in an effort to find genes that confer resistance against a downy mildew that is destructive to the crop. Van Deynze reported a draft genome sequence using SMRT Sequencing (Quiver polishing was still underway at the time of the workshop) that already showed a marked improvement in N50 contig length compared to a previous short-read assembly of the genome. Watch recording: A De Novo Draft Assembly of Spinach Using Pacific Biosciences Technology
From USDA’s Agricultural Research Service, molecular biologist Sean Gordon discussed the need for long-read sequencing to map an organism’s transcriptome. His team analyzed the wood-decaying fungus Plicaturopsis crispa first with short reads and found that they were missing exons and other important information. “There is no path from short reads to accurate isoforms,” he said. They switched to SMRT Sequencing so they could observe, rather than infer, full-length transcripts. Gordon showed one particular gene to illustrate the success of the approach: with short-read sequencing, this gene was predicted to have six isoforms; with PacBio, the team observed and confirmed 118 isoforms instead. He also noted that generating a transcriptome from PacBio data does not require a reference genome. His team did have a reference for P. crispa, however, which they used to double-check the PacBio results and found them to be highly accurate. Gordon said that the long reads also enabled unexpected findings, such as abundant read-through transcription, in which multiple ORFs occurred in a transcript. (The recording is not available at this time.)
Finally, our own Edwin Hauw spoke about the PacBio technology roadmap for the coming year. Sample prep improvements are expected to reduce input DNA requirements (down to 10-100 ng), improve preps for longer insert sizes, and streamline kits. A new C4 chemistry is expected to extend average read lengths to 10-15 Kb this year, with the long-term goal of generating about 1.6 Gb per SMRT Cell. PacBio is also planning to focus on data analysis improvements, including an easy-to-use GUI for isoform sequencing and tools for viral minor-variant detection and long-amplicon haplotype analysis. In addition, Hauw told users that PacBio is working to provide better assemblers for diploid de novo genomes or low-coverage genomes, as well as a faster version of Quiver and regional methylation detection, including 5mC without bisulfite conversion, with an expected release date later in the year.
A reminder to all PAGXXII attendees, don’t forget to submit your “Most Interesting Genome in the World” project for a chance to win SMRT Sequencing with Sage Science Blue Pippin™ size selection. Contest ends January 31 – submission form, full details and contest rules.