AGBT Days 1 & 2: Metagenomic Dark Matter and the GenomeAsia 100K
Thursday, February 18, 2016
This year’s AGBT presentations took our minds off how much we missed the Marco Island beach. Wednesday’s opening plenary talks ranged from the ocean virome to Ebola and beyond. David Haussler’s call for open and better sharing of human genomes was a message that clearly resonated with this community, and we hope it inspires people to find new ways of breaking down the data silos.
On Thursday, the 800 or so attendees braced for a full day of scientific sessions. We can’t recap all of the talks here, but check out AGBT’s blog coverage for detailed accounts of the plenary sessions.
One particularly fascinating talk was given by Eddy Rubin of the Joint Genome Institute, who spoke about their forays into sequencing metagenomic dark matter — microbes from environmental samples that can’t be cultivated in the lab. Remarkably, they’ve found that among the approximately 4,000 metagenome samples they reanalyzed, there are many environments in which more than 10% of the assembled contigs do not use the canonical translation of codons into amino acids. In most cases, one of the stop codons instead encodes an amino acid. In other cases, sense codons appear to have different translations. They also used a novel approach to mine their metagenome dark matter for unknown phage. Phage can be challenging to find as they don’t have a 16S equivalent; instead, the researchers looked for genes with homology to known phage-specific genes, such as cas9. With this method, the team was able to identify more than 133,000 novel phage genomes, including a subset with genomes larger than 600 kb. Our bioinformaticians left the talk eager to log into our cluster and start running alternative translation scripts on PacBio metagenome data.
The last plenary talk of the day really captured our attention: Stephan Schuster from Nanyang Technological University introduced GenomeAsia 100K, an effort to sequence 100,000 people of Asian descent to better categorize genetic variation there. Schuster showed that the world’s largest populations have been centered in Asia for at least 2,000 years, so there will be tremendous value in cataloging their diversity. He noted that the groups behind the project already have 50,000 samples in hand, and sequencing is expected to take three years. We’ve been fans of population-specific reference genomes for years, and of course it’s no secret that we believe each person will ultimately be his or her own best reference as the quality and affordability of de novo sequencing continue to improve. We’re cheering this new project and wishing them all the best!