Efforts to produce a reference-grade goat genome assembly for improved breeding programs have paid off. A new Nature Genetics publication reports a high-quality, highly contiguous assembly that can be used to develop genotyping tools for quick, reliable analysis of traits such as milk and meat quality or adaptation to harsh environments. The program also offers a look at how different scaffolding approaches perform with SMRT Sequencing data.
“Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome” comes from lead authors Derek Bickhart, Benjamin Rosen, and Sergey Koren; senior author Tim Smith; and collaborators. The large team of scientists is affiliated with the USDA Agricultural Research Service, National Human Genome Research Institute, the University of Washington, and many other organizations.
The project was motivated by a clear need to develop methods for high-quality livestock genome assemblies to benefit breeding communities. Goat offers a particular boost to developing countries, where these animals are a primary source of textile fiber, milk, and meat. “A finished, accurate reference genome is essential for advanced genomic selection of productive traits and gene editing in agriculturally relevant plant and animal species,” the scientists report. Previous efforts to sequence the goat genome with short reads resulted in a highly fragmented assembly that could not resolve repetitive and other challenging regions. For this work, the team analyzed the genome of a highly homozygous male San Clemente goat (Capra hircus) using a number of technologies.
They chose SMRT Sequencing because its long reads could characterize even the most difficult genomic regions. “Initial assembly of the PacBio data alone resulted in a contig NG50 … of 3.8 Mb,” the team reports. PacBio contigs were then connected with optical mapping and Hi-C data to create extremely long scaffolds in the final 2.92 Gb assembly. “These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps,” they write. The assembly is 400 times more continuous than the previous short-read assembly.
To learn more about how these technologies complement each other, the scientists analyzed results from optical mapping and Hi-C data separately. They found that Hi-C data yielded a tenth the number of scaffolds that optical mapping did, but it led to more misoriented contigs, which were correlated with restriction site density. “Ultimately, we found that sequential scaffolding with optical mapping data followed by Hi-C data yielded an assembly with the highest continuity and best agreement with the [radiation hybrid] map,” the team reports, noting that this approach is significantly less expensive than generating a short-read draft genome assembly and manually finishing it to high quality.
The final assembly includes notoriously difficult regions, such as centromeric DNA and the Y chromosome. Two chromosomes appear to be completely assembled, and two others seem to include “the elusive p arm,” Bickhart et al. write.
Of course, since the scientists were focused on building a resource that would help breeding programs, they also assessed its potential impact in that space. “Chromosome-scale continuity of the ARS1 assembly was found to have appreciable positive impact on genetic marker order for the existing C. hircus 52K SNP chip3,” they report.
Going forward, the team hopes to generate a phased diploid assembly for C. hircus.
March 21, 2017 | General