Jason Chin, senior director of bioinformatics at PacBio, talks about using long-read sequence data and string graph assembly for assembling diploid genomes. A major challenge for diploid genome assembly is in distinguishing homologous regions from repeats, so he discusses how long reads are essential for resolving repeat regions. In the presentation, Chin displays data from two inbred Arabidopsis strains used to create a synthetic diploid assembly.
Shane Brubaker from renewable oil manufacturer Solazyme reports using the PacBio system to sequence the genome of a GC-rich strain of algae that couldn’t be fully assembled with short-read sequence data. He notes that CCS reads exceed Sanger quality at significantly lower cost.
Susan Strickler of the Boyce Thompson Institute presented strategies for assembling the genome of Arabica coffee, an allotetraploid with a genome size of approximately 1.3 Gb. A de novo PacBio assembly was constructed and presented. The new high-quality reference will be used to guide assemblies of the diploid ancestors of Arabica coffee and re-sequencing data for a set of C. arabica accessions to more fully characterize the genetic diversity of this crop species that is highly susceptible to climate change.
PacBio CEO Mike Hunkapiller looks at the past, present, and future of human genome sequencing, reflecting on the 15-year anniversary of the announcements of the first human genomes, noting these efforts required considerable effort and produced draft assemblies with contig N50s in the 20-24 kb range. He unveils the PacBio® diploid assembly of Craig Venter’s genome.
Jeong-Sun Seo of Macrogen and Seoul National University College of Medicine reports on sequencing many Asian genomes to better understand genetic variation in that population. He shows that identifying certain structural variants may explain diseases that disproportionately affect Asian people.
In his talk from the AGBT 2015 PacBio workshop, Craig Venter detailed plans to sequence 1 million genomes and gather extensive phenotypic data to make sense of them. Included: generating 30 reference genomes to represent ethnogeographic diversity; the need for long-range continuity in sequencing; and truly predictive genomics.
Yuta Suzuki from the University of Tokyo presents his AGBT poster on heterozygotic DNA methylation patterns. He used kinetic data from SMRT Sequencing to generate epigenetic information on samples ranging from human to medaka fish and was able to analyze haplotype-specific methylation data. He also shows that long reads are better able to capture data about CpG islands than short-read sequences.
Jonas Korlach, Chief Scientific Officer at PacBio, discussed the technology waves that have followed the initial human genome sequencing project, where we are today, and where we are going. Today, we are in what Korlach calls the 4th wave, where more comprehensive whole-genome re-sequencing is occurring, and we are nearing the 5th, when we will actually be able to free ourselves from reference genomes and sequence everything de novo.
During this presentation from ASHG 2015, Maria Nattestad of Cold Spring Harbor Laboratory described the study of a Her2-amplified breast cancer cell line using long-read sequencing from PacBio. With reads as long as 71 kb, she was able to characterize extensive and complex rearrangements and found more than 11,000 structural variants. She also used the Iso-Seq method to find gene fusions, including some novel ones.
Yunfei Guo, from the University of Southern California, presents his ASHG 2015 poster on a de novo assembly of a diploid Asian genome. The uniform coverage of long-read sequencing helped access regions previously unresolvable due to high GC bias or long repeats. The assembly allowed scientists to fill some 400 gaps in the latest human reference genome, including some as long as 50 kb.
Yunfei Guo, a grad student at the University of Southern California, discusses the benefits of SMRT Sequencing: very long reads that make it possible to resolve long repetitive regions and discover structural variants, and a random error mode that allows for extremely high accuracy.
Jason Chin, senior director of bioinformatics at PacBio, talks about using long-read sequence data to generate diploid genome assemblies to produce comprehensive haplotype sequence reconstructions. In the presentation, Chin describes the FALCON Unzip process that combines SNP phasing with the assembly process and allows for determination of the haplotype sequences and identification of structural variants. He presents an example of diploid assembly from inbred Arabidopsis strains.