Day 3 of the AGBT conference was packed with interesting talks – we’ve covered a few highlights below. Admittedly, it took a little more caffeine than usual to power through the day…..
In the clinical session, Euan Ashley from Stanford told attendees that genomic medicine is no longer something that we’re aiming for; it’s already here and being used routinely. He expressed concerns about accurate mapping of short-read sequence data for clinical utility, adding that the community needs to make progress in understanding complex genomic regions. Ashley noted that we still don’t have a gold-quality human genome with every single base known, and that achieving that remains an important goal for the field.
Jonathan Mudge from the Sanger Institute presented work by the GENCODE consortium to define Human genes in the ENCODE project data, and said “the functional annotation of the transciptome is in its infancy”. He describe how the consortia are planning to embark on a large new project using long-read PacBio® data to help improve annotation, and capture true end-points for novel gene transcripts.
Sarah Tishkoff, from the University of Pennsylvania, presented on “Integrative Genomic Studies of Adaptive Traits in Africa”. She described her work studying novel phenotypes in sub-populations within Africa, and the challenges of linking phenotypes to specific genotypes. One of the reasons she cited was the lack of representation of the African population specific genome regions and structural variants in the current human genome reference. Future planned work by the Genome Reference Consortia should help resolve this disparity, as additional population-specific alt loci polymorphic sequences are added to the reference.
During the evening technology session, Tim Smith from USDA’s Agricultural Research Service presented a goat assembly produced with PacBio sequence data. A previous goat assembly generated from short-read data had a contig N50 of about 18 kb with hundreds of thousands of contigs, but the PacBio assembly had a contig N50 of about 2.6 Mb and just 5,902 contigs. To get the highest genome quality, he told attendees, it’s helpful to use long reads. The team is following up the goat effort with new projects to sequence pig, sheep, and cow using PacBio data.
Finally, Vince Magrini from Washington University in St. Louis spoke about using RNA-seq for viral monitoring. He showed data from PacBio sequencing, among other technologies, which was used to characterize clinical isolates of influenza. The long reads were important for filling gaps in a short-read assembly, he said.
With all the talk about precision medicine at the conference, we also really enjoyed this thoughtful blog post by Brian Kreuger (@h2so4hurts) from Columbia University Medical Center entitled ‘When Whole Genome Sequencing Doesn’t Give Us the Whole Genome‘.