Allen Van Deynze from UC Davis presents the genome sequencing and assembly project for spinach, an organism of 980 Mb. Results indicate a high-accuracy assembly with significantly higher N50 contig length than a previous short-read assembly. The PacBio assembly has allowed for filling gaps in the prior assembly.
David Wheeler from Baylor’s Human Genome Sequencing Center presents data from matched tumor/normal pairs. His research uses SMRT Sequencing to identify structural rearrangements, like tandem duplications, finding that many of these were caused by repeat regions moving around the genome. Also: details of the new Honey-tails and Honey-spots algorithms.
Jason Chin, senior director of bioinformatics at PacBio, talks about using long-read sequence data and string graph assembly for assembling diploid genomes. A major challenge for diploid genome assembly is in distinguishing homologous regions from repeats, so he discusses how long reads are essential for resolving repeat regions. In the presentation, Chin displays data from two inbred Arabidopsis strains used to create a synthetic diploid assembly.
Dick McCombie from Cold Spring Harbor Laboratory describes de novo sequencing of several organisms, including yeast, Arabidopsis, and rice. With SMRT Sequencing, structural differences are preserved and full chromosomes can assemble into single contigs. Longest read observed: 54 kb.
Evan Eichler, Howard Hughes Medical Investigator from the University of Washington discusses his use of the PacBio system to study difficult-to-sequence regions of the human and chimp genomes. Eichler has identified a number of rapidly evolving hot spots in the human genome that are associated with disease. These regions are quite long and have extremely repetitive DNA sequence, making them difficult to elucidate with short-read sequencing and very expensive to interrogate with Sanger sequencing. Eichler’s goal is to fill in the missing regions of the human genome reference, many of which contain segmental duplications.
PacBio CEO Mike Hunkapiller looks at the past, present, and future of human genome sequencing, reflecting on the 15-year anniversary of the announcements of the first human genomes, noting these efforts required considerable effort and produced draft assemblies with contig N50s in the 20-24 kb range. He unveils the PacBio® diploid assembly of Craig Venter’s genome.
Michiel van Eijk of KeyGene shared a de novo PacBio assembly of tetraploid cotton. The genome assembly was further enhanced and annotated using Iso-Seq data collected from cotton root, leaf, and stem tissues. The data, full-length cDNA transcripts, captured alternative splicing diversity across these tissue types, allowing for isoform differentiation.
PacBio customers discuss their applications of PacBio SMRT Sequencing and long reads, including Lemuel Racacho (Children’s Hospital of Eastern Ontario Research Institute), Matthew Blow (JGI), Yuta Suzuki (U. of Tokyo), Daniel Geraghty (Fred Hutchinson Cancer Center), and Mike Schatz (CSHL)
Susan Strickler of the Boyce Thompson Institute presented strategies for assembling the genome of Arabica coffee, an allotetraploid with a genome size of approximately 1.3 Gb. A de novo PacBio assembly was constructed and presented. The new high-quality reference will be used to guide assemblies of the diploid ancestors of Arabica coffee and re-sequencing data for a set of C. arabica accessions to more fully characterize the genetic diversity of this crop species that is highly susceptible to climate change.
Tim Smith of the USDA presents his work to establish a high-quality reference genome of the San Clemente goat. After generating 70-fold PacBio sequence data, the PacBio assembly proved to be far more complete than the existing draft reference genome, with contigs extending 100 times longer on average.
Robert VanBuren of the Danforth Plant Science Center and winner of the 2014 SMRT Grant Program presents a de novo assembly of the Oro grass genome (Oropetium thomaeum). The reference genome will aid scientist studying drought tolerance in common crop species, especially cereals, though comparative genomics to understand potential key genetic underpinnings for this “resurrection” trait. Initial comparative results to Brachypodium and maize are presented, as well as secondary analysis to identify key metabolic traits.
Jeong-Sun Seo of Macrogen and Seoul National University College of Medicine reports on sequencing many Asian genomes to better understand genetic variation in that population. He shows that identifying certain structural variants may explain diseases that disproportionately affect Asian people.
At the PacBio AGBT workshop, Gene Myers from the Max-Planck Institute said it will soon be possible to generate a near-perfect human assembly. He presents a portfolio of analysis and quality-control tools designed to work with SMRT Sequencing data.
In his talk from the PacBio workshop at AGBT 2015, Dick McCombie from Cold Spring Harbor Laboratory describes the use of SMRT Sequencing to analyze a breast cancer cell line with complex genomic events. Still ongoing, the project has already uncovered structural variants missed by other sequencers.