G10K & B10K Initiatives to Generate Thousands of High-Quality Vertebrate Genomes to Aid Conservation Efforts

The expanded G10K project, The Vertebrate Genome Project, released its first batch of 15 high-quality assemblies.
We’re excited to announce that we’ll be working closely with two programs that are committing significant resources toward generating reference-quality genomes of thousands of vertebrate species. Both the Genome 10K (G10K) and Bird 10,000 Genomes (B10K) initiatives have invested in SMRT Sequencing to build high-quality de novo genome assemblies for the next phase of their programs. By sequencing large numbers of vertebrates, the groups hope to develop resources that will be useful for species conservation efforts in the future.
The G10K project was established in 2009 by a consortium of biologists and genome scientists, including Duke neurobiologist Erich Jarvis, Steve O’Brien of the Dobzhansky Center for Genome Bioinformatics, David Haussler and Beth Shapiro of the UC Santa Cruz Genome Institute, and Oliver Ryder of the San Diego Zoo Institute for Conservation Research. Together they determined to sequence the genomes of 10,000 vertebrate species by 2020. The B10K project, launched in 2015 and co-led by Jarvis along with Guojie Zhang of BGI and Thomas Gilbert of the University of Copenhagen, is an initiative to generate representative draft genome sequences for all 10,500 bird species, also within the next five years.
These groups have already contributed genomic resources to the conservation biology community. They collaborated for the first phase of the projects, yielding outcomes such as the Avian Phylogenomics Project, which involved more than 200 scientists and sequenced the genomes of more than 45 new bird species. At the start they used short-read technologies, but have since discovered that with long-read SMRT Sequencing they can produce de novo assemblies of complex genomes with much higher quality.
Jarvis recently sequenced two bird species with SMRT Sequencing, generating high-quality assemblies with long, gapless contigs, half of which were several megabases in length or larger. For example, for Anna’s hummingbird (Calypte anna), the project significantly increased the number of complete genes and reduced the number of contigs compared to a previous short-read assembly, from 124,000 contigs using short-read sequencing to 1,000 using SMRT Sequencing. In a separate sequencing project for zebra finch, PacBio Sequencing fully resolved gaps in the Sanger reference and detected errors in the previous reference genome. For additional details, check out our recap of Jarvis’s talk at this year’s East Coast user group meeting.
Now, the G10K and the B10K initiatives will include Sequel Systems for the next phases of their work. They intend to sequence the genomes of several thousand vertebrate species with PacBio technology for diploid-resolved, high-quality de novo genome assemblies, and perform subsequent chromosome-level scaffolding with complementary approaches, including BioNano Genomics’ optical genome mapping, Dovetail’s proximity in vitro genome mapping, and Phase Genomics Hi-C mapping. To that end, Jarvis, who is now at The Rockefeller University and affiliated with the New York Genome Center, has ordered two Sequel Systems and plans to bring on three additional units. Several other global leaders of the G10K and B10K consortia will also contribute use of their recently acquired Sequel Systems toward their goal of creating de novo assembled vertebrate genomes, including Harris Lewin at UC Davis in the USA, Richard Durbin at the Sanger Institute in the UK, Gene Myers at the Max Planck Institute of Molecular Cell Biology & Genetics in Germany, and Guojie Zhang with affiliations at BGI in China and Denmark.
Jarvis and other members of the G10K and B10K consortia recently submitted a proposal to the MacArthur Foundation’s new 100andchange competition, hoping to secure $100 million to create a Digital Noah’s Ark Genome Library of all 8,000 endangered vertebrate species on Earth. In addition, the G10K and B10K consortiums decided that their goals and the MacArthur proposal will be stages of a longer-term larger effort to populate the Digital Noah’s Ark Genome Library with high-quality blueprint genomes of all ~66,000 vertebrate species in the world through an umbrella program called the Vertebrate Genomes Project. It’s an audacious goal and we wish them luck in the competition!

