Building a Digital Genome Ark: Vertebrate Genome Project Releases 15 New Reference Genomes
Thursday, September 13, 2018
When creating a global genomic ark of creatures great and small, scientists are turning to the comprehensive coverage and quality of PacBio sequencing.
The Vertebrate Genomes Project (VGP), an international consortium of more than 150 scientists from 50 academic, industry and government institutions in 12 countries, recently released the first 15 of an anticipated 66,000 high-quality reference genomes representing all vertebrate species on Earth.
The VGP consortium spent three years selecting technologies and workflows to produce higher quality, “platinum-level” genomes, and SMRT Sequencing was selected to generate the initial assemblies.
“Until recently, sequencing the complete genome of a single animal required millions of dollars and years of effort. New sequencing technologies have dramatically reduced the cost and made it possible to reconstruct near-perfect genomes for the first time,” said VGP member Adam Phillippy of the Genome Informatics Section at the National Human Genome Research Institute.
From the duck-billed platypus to the limbless serpentine amphibian Two-lined caecilian, the first data release represents species from all five vertebrate classes – mammals, birds, reptiles, amphibians, and fishes.
The first phase of the project will continue with the sequencing of at least one species representing each of the 260 orders of living vertebrates. Subsequent sequencing will cover all 1,045 families, then 9,478 genera, and ultimately all of the approximately 66,000 species of vertebrates.
“The last 20 years have proven the value of openly available high-quality reference genome sequences to scientific research, but until now these have mostly been available just for humans and other key organisms,” said Richard Durbin, of the University of Cambridge and the Wellcome Sanger Institute. “We are entering an era in which we will obtain reference genome sequences for all species across the Tree of Life.”
VGP is one of many large-scale international projects to sequence the DNA of thousands of plant, animal, fungal and bacterial species that have chosen PacBio Single Molecule, Real-Time (SMRT) Sequencing to assemble some of the most complete genomes to date. These comprehensive catalogs of genetic code provide valuable resources to researchers in their quest to understand the biology, physiology, development and evolution of a multitude of living organisms, and will aide in their conservation.
Another is the Bat1K initiative, and effort by Sonja Vernes of the Max Planck Institute and others to catalog the genetic diversity in 1,300 types of bats.
“The long-read sequencing technology from PacBio is allowing us to produce bat genomes of unprecedented quality and resolution as part of the Bat1K project,” said Vernes. “This is going to be a big step forward for understanding how the genes and also the non-coding DNA in these genomes influence the weird and wonderful features of bats.”
Other projects include:
- The Bird 10,000 Genomes (B10K) Project, which is aiming to generate representative draft genome sequences from all extant bird species; many of its members became founders of the The Genome 10,000 consortium (G10K), which evolved into the Vertebrate Genome Project;
- Efforts to sequence nationally significant species, such as the Sanger 25 Project by the Wellcome Trust Sanger Institute and the Canada 150 Sequencing Initiative (CanSeq150) by Canada’s Genomics Enterprise.
- The NCTC 3000 initiative by the UK’s National Collection of Type Cultures to sequence the genomes of 3,000 strains of bacteria;
- Whole Genome Assembly of the Maize NAM Founders, a multi-institutional effort to create a 26-line pangenome maize reference collection, one of many initiative to sequence important agricultural crops to discover and utilize novel genes, traits and/or genomic regions for crop improvement and basic research;
- The Pan-Genome Analysis of Sorghum project at the Donald Danforth Plant Science Center, which includes 15 sorghum lines covering the diversity of this important bioenergy, food, and feed crop. The project is supported through the Community Science Program (CSP) of the DOE Joint Genome Institute with PacBio sequencing at HudsonAlpha Institute for Biotechnology.
- The Open Green Genomes Initiative, also supported by DOE Joint Genome Institute, which will generate high-quality genome assemblies and annotations for 35 species representing all major evolutionary lineages in the land plant tree of life.
- The Functional Annotation of Animal Genomes Project (FAANG), which is aiming to produce comprehensive maps of functional elements in the genomes of domesticated animal species;
- Marine and aquaculture efforts such as The Aqua-Hundred Genome Project;
- Insect initiatives, including the i5k Project to sequence 5,000 arthropod genomes and The Global Ant Genomics Alliance (GAGA) to sequence 200 ant species.
If you’re interested in supporting this important effort, the group is soliciting donations for ongoing project support.