From Seabass to Salmon:
Swimming in High-Quality Genomes
Thursday, May 12, 2016
A global collaboration of researchers has produced what is likely the most contiguous assembly of a fish genome to date. “Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding,” published in PLoS Genetics, comes from lead author Shubha Vij and senior author László Orbán with collaborators at nearly two dozen labs.
The team set out to sequence Lates calcarifer, the Asian seabass, which has a genome of about 670 Mb grouped into 24 A chromosomes and as many as 10 B chromosomes. They used SMRT Sequencing from PacBio to overcome the fragmented and incomplete assemblies associated with short-read data, and incorporated optical and genetic mapping to add additional layers of information to the assembly. At the end, they generated an incredibly valuable resource that has been shared with the community. “The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics,” the authors write.
That quality largely came from 90x SMRT Sequencing coverage of the genome, which on its own produced an assembly with about 3,900 contigs and a contig N50 length greater than 1 Mb. Layering in optical mapping and genetic markers allowed the team to place contigs into larger scaffolds, ultimately achieving 24 individual chromosomal scaffolds. The scientists anticipate this resource will be particularly useful for comparative genomics and for development of assays such as GWAS, allele mining, and genomic selection.
The scientists used the MHC region for a close-up inspection of the genome quality and contiguity. They compared the seabass MHC locus to that of stickleback, which had “the most complete published fish genome assembly available in the public repositories” at the time of the study, the team reports. In Asian seabass, they identified 14 MHC class I genes, spread across eight contigs, half of which were longer than 1 Mb. “By contrast, the MHC-class I genes from stickleback, were located on almost double the number of contigs, of which all except one were ≤ 113 kb in length,” they write.
The remarkable assembly quality gave researchers the opportunity to delve into repeat sequences, which represent nearly 20% of the genome and included several kinds of complex tandem repeat sequences. In addition, the team studied duplicated genes and found they were enriched for functions that could help explain how the fish transforms from male to female after maturity.
The scientists note that while most eukaryotic genomes published so far have been produced with short-read data, a strategy using only PacBio data “seems ideal for assembling mid-to-large eukaryotic genomes since it ensures contiguity, less ambiguity and assembly metrics surpassing all of the fish genomes sequenced thus far.”
And if you just can’t get enough fish genomes, don’t miss this publication reporting a new salmon assembly. The team used PacBio sequencing as part of a multi-platform assembly process; the findings shed light on the rediploidization of salmon.