ASHG 2015: Highlights from the Platinum Genome Session and More
Thursday, October 15, 2015
During the final days of the ASHG meeting last week in Baltimore, a number of scientists offered great presentations based on data generated with SMRT Sequencing, including an entire session on building platinum genomes. We’ve rounded up the highlights here:
Karyn Meltz Steinberg from Washington University’s McDonnell Genome Institute spoke about building a platinum human assembly from single-haplotype genomes. Her team defines “platinum” as covering at least 98% of the sequence with every contig associated with a chromosome. They use long-read PacBio sequencing for de novo sequencing and assembly, followed by scaffolding with BioNano Genomics or Dovetail Genomics technology. When necessary, they then perform PacBio sequencing of BACs for targeted regions, such as gap-filling. Using CHM13 as an example, she shared several examples of specific genomic regions and assembly challenges, both for short- and long-read data. By combining BioNano mapping with PacBio sequence data, they produced a hybrid assembly with 254 contigs, compared to 1,590 contigs for the initial PacBio assembly lacking BioNano mapping.
Bobby Sebra from the Icahn School of Medicine at Mount Sinai talked about an effort to resolve regions in the human genome — such as complex structural variants — that have not been addressed by NGS or Sanger sequencing. Working with the NA12878 genome, Sebra and his colleagues combined PacBio and Illumina sequence data with BioNano mapping. The resulting assembly filled 28 gaps in the latest human reference genome and featured a multi-megabase contig N50 length. The comparison to GRCh38 confirmed previous studies suggesting that tandem repeats and other structural variants are underrepresented in the reference genome; long-read sequencing can effectively characterize these regions. Sebra noted that many challenging regions in the human genome have implications for pharmacogenomics or disease associations, and that detailing these regions carefully will be important for clinical utility of genomics.
In that same session, Justin Zook from the National Institute of Standards and Technology presented on progress at the Genome in a Bottle consortium, including some upcoming reference genomes from Han Chinese and Ashkenazi Jewish family trios. These new genomes have been generated with a number of sequencing technologies, including ones from PacBio, BioNano, 10X, Complete Genomics, Oxford Nanopore, and others. GIAB has already released some reference materials, which scientists are using to help validate variant calls for their own genome assemblies. Zook mentioned tools produced by the CDC and underway at the Global Alliance to allow scientists to compare sequencing data to what other projects have reported. They’re also working on analysis tools to show confidence scores for structural variant calls.
In a separate session, Kiana Mohajeri from the University of Washington reported on a region of chromosome 8 that features the largest known inversion variant in the human genome; it spans several megabases and includes several segmental duplications. Seeking to determine the evolutionary history of this region and to better understand the variation found in human genomes, the team sequenced more than 70 BAC clones with SMRT Sequencing. They produced a gap-free 6.2 Mb tiling path with 99.999% accuracy — a far more complete and contiguous sequence than the human reference genome has for this region. The tiling path shows four inversion-associated repeats with 98% sequence identity flanking the internal inversions. By comparing the region to other primate genomes, they theorize that it was formed between 200,000 and 800,000 years ago, but note that the oldest of the repeats appears to be 19 million years old.