Generating de novo reference genome assemblies for non-model organisms is a laborious task that often requires a large amount of data from several sequencing platforms and cytogenetic surveys. By using PacBio sequence data and new library creation techniques, we present a de novo, high quality reference assembly for the goat (Capra hircus) that demonstrates a primarily sequencing-based approach to efficiently create new reference assemblies for Eukaryotic species. This goat reference genome was created using 38 million PacBio P5-C3 reads generated from a San Clemente goat using the Celera Assembler PBcR pipeline with PacBio read self-correction. In order to generate the assembly, corrected and filtered reads were pre-assembled into a consensus model using PBDAGCON, and subsequently assembled using the Celera Assembly version 8.2. We generated 5,902 contigs using this method with a contig N50 size of 2.56 megabases. In order to generate chromosome-sized scaffolds, we used the LACHESIS scaffolding method to identify cis-chromosome Hi-C interactions in order to link contigs together. We then compared our new assembly to the existing goat reference assembly to identify large-scale discrepancies. In our comparison, we identified 247 disagreements between the two assemblies consisting of 123 inversions and 124 chromosome-contig relocations. The high quality of this data illustrates how this methodology can be used to efficiently generate new reference genome assemblies without the use of expensive fluorescent cytometry or large quantities of data from multiple sequencing platforms.
The goat (Capra hircus) remains an important livestock species due to the species’ ability to forage and provide milk, meat and wool in arid environments. The current goat reference assembly and annotation borrows heavily from other loosely related livestock species, such as cattle, and may not reflect the unique structural and functional characteristics of the species. We present preliminary data from a new de novo reference assembly for goat that primarily utilizes 38 million PacBio P5-C3 reads generated from an inbred San Clemente goat. This assembly consists of only 5,902 contigs with a contig N50 size of 2.56 megabases which were grouped into scaffolds using cis-chromosome associations generated by the analysis of Hi-C sequence reads. To provide accurate functional genetic annotation, we utilized existing RNA-seq data and generated new data consisting of over 784 million reads from a combination of 27 different developmental timepoints/tissues. This dataset provides a tangible improvement over existing goat genomics resources by correcting over 247 misassemblies in the current goat reference genome and by annotating predicted gene models with actual expressed transcript data. Our goal is to provide a high quality resource to researchers to enable future genomic selection and functional prediction within the field of goat genomics.
Goat is an important source of milk, meat, and fiber, especially in developing countries. An advantage of goats as livestock is the low maintenance requirements and high adaptability compared to other milk producers. The global population of domestic goats exceeds 800 million. In Africa, goat production is characterized by low productivity levels, and attempts to introduce more productive breeds have met with poor success due in part to nutritional constraints. It has been suggested that incorporation of selective breeding within the herds adapted for survival could represent one approach to improving food security across Africa. A recently produced genome assembly of a Chinese Yunnan breed goat, based on 192 Gb of short reads across a range of insert sizes from 180 bp to 20 kb, reported a contig N50 of 18.7 kb. The scaffold N50 was improved from 2.2 Mb to 3.1 Mb by addition of fosmid end sequence, with an estimated 140 million Ns in gaps and 91% coverage. The assembly has proven somewhat problematic for pursuing genome-wide association analysis with SNP arrays, apparently due in part to errors in ordering of markers using the draft genome. In order to provide a higher quality assembly, we sequenced a highly inbred, San Clemente breed goat genome using 458 SMRT cells on the Pacific Biosciences platform. These cells generated 193.5 Gbases of sequence after processing into subreads, with mean 5110 bases and max subread length of 40.5 kb. This sequence data generated an assembly using the recently reported MHAP error correction approach and Celera Assembler v8.2. The contig N50 was 2.5 Mb, with the largest contig spanning 19.5 Mb. Additional characteristics of the assembly will be presented.
Goats are specialized in dairy, meat and fiber production, being adapted to a wide range of environmental conditions and having a large economic impact in developing countries. In the last years, there have been dramatic advances in the knowledge of the structure and diversity of the goat genome/transcriptome and in the development of genomic tools, rapidly narrowing the gap between goat and related species such as cattle and sheep. Major advances are: 1) publication of a de novo goat genome reference sequence; 2) Development of whole genome high density RH maps, and; 3) Design of a commercial 50K SNP array. Moreover, there are currently several projects aiming at improving current genomic tools and resources. An improved assembly of the goat genome using PacBio reads is being produced, and the design of new SNP arrays is being studied to accommodate the specific needs of this species in the context of very large scale genotyping projects (i.e. breed characterization at an international scale and genomic selection) and parentage analysis. As in other species, the focus has now turned to the identification of causative mutations underlying the phenotypic variation of traits. In addition, since 2014, the ADAPTmap project (www.goatadaptmap.org) has gathered data to explore the diversity of caprine populations at a worldwide scale by using a wide variety of approaches and data.
The domestic goat (Capra hircus) is an important ruminant species both as a source of antibody-based reagents for research and biomedical applications and as an economically important animal for agriculture, particularly for developing nations that maintain most of the global goat population. Characterization of the loci encoding the goat immune repertoire would be highly beneficial for both vaccine and immune reagent development. However, in goat and other species whose reference genomes were generated using short-read sequencing technologies, the immune loci are poorly assembled as a result of their repetitive nature. Our recent construction of a long-read goat genome assembly (ARS1) has facilitated characterization of all three antibody loci with high confidence and comparative analysis to cattle. We observed broad similarity of goat and cattle antibody-encoding loci but with notable differences that likely influence formation of the functional antibody repertoire. The goat heavy-chain locus is restricted to only four functional and nearly identical IGHV genes, in contrast to the ten observed in cattle. Repertoire analysis indicates that light-chain usage is more balanced in goats, with greater representation of kappa light chains (~ 20-30%) compared to that in cattle (~ 5%). The present study represents the first characterization of the goat antibody loci and will help inform future investigations of their antibody responses to disease and vaccination.
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus) based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ~400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.
The newly described de novo goat genome sequence is the most contiguous diploid vertebrate assembly generated thus far using whole-genome assembly and scaffolding methods. The contiguity of this assembly is approaching that of the finished human and mouse genomes and suggests an affordable roadmap to high-quality references for thousands of species.
Three years ago, Erich Jarvis helped mastermind a massive DNA sequenc- ing effort that netted genomes for more than 40 bird species and produced a better avian family tree. But when he tried to compare the avian genomes to those of other species to learn about the evolution and function of several key brain genes, he was stymied. His team found that gene sequences from most of the comparison species—even humans—were incomplete, missing, or misplaced in the larger genome. The group had to resequence sections of sev- eral genomes to get the needed data, delaying their project many months.
Mycoplasma yeatsii is a goat mycoplasma species that, although an obligate parasite, accommodates this lifestyle as an inapparent commensalist. High-frequency transformation has also been reported for this species. The complete 895,051-bp genome sequence of strain GM274B has been determined, enabling an analysis of the features of this potential cloning host. Copyright © 2015 Calcutt et al.
Genome sequences of Corynebacterium pseudotuberculosis strains 48252 (human, pneumonia), CS_10 (lab strain), Ft_2193/ 67 (goat, pus), and CCUG 27541.
Here we report the genome sequencess of four Corynebacterium pseudotuberculosis strains. These include a strain isolated from a patient with C. pseudotuberculosis pneumonia (48252), a strain isolated from pus in goat (Ft_2193/67), a laboratory strain originating from strain Ft_2193/67 (CS_10), and the draft genome of an equine reference strain, CCUG 27541. Copyright © 2014 Håvelsrud et al.
The investigation of genetic diversity at molecular level has been proposed as a valuable complement and sometimes proxy to phenotypic diversity of local breeds and is presently considered as one of the FAO priorities for breed characterization. By recommending a set of selected molecular markers for each of the main livestock species, FAO has promoted the meta-analysis of local datasets, to achieve a global view of molecular genetic diversity. Analysis within the EU Globaldiv project of two large goat microsatellite datasets produced by the Econogene Consortium and the IAEA CRP–Asia Consortium, respectively, has generated a picture of goat diversity across continents. This indicates a gradient of decreasing diversity from the domestication centre towards Europe and Asia, a clear phylogeographic structure at the continental and regional levels, and in Asia a limited genetic differentiation among local breeds. The development of SNP panels that assay thousands of markers and the whole genome sequencing of livestock permit an affordable use of genomic technologies in all livestock species, goats included. Preliminary data from the Italian Goat Consortium indicate that the SNP panel developed for this species is highly informative. The existing panel can be improved by integrating additional SNPs identified from the whole genome sequence alignment of goats adapted to extreme climates. Part of this effort is being achieved by international projects (e.g. EU FP7 NextGen and 3SR projects), but a fair representation of the global diversity in goats requires a large panel of samples (i.e. as in the recently launched 1000 cattle genomes initiative). Genomic technologies offer new strategies to investigate complex traits difficult to measure. For example, the comparison of patterns of diversity among the genomes in selected groups of animals (e.g. adapted to different environments) and the integration of genome-wide diversity with new GIScience-based methods are able to identify molecular markers associated with genomic regions of putative importance in adaptation and thus pave the way for the identification of causative genes. Goat breeds adapted to different production systems in extreme and harsh environments will play an important role in this process. The new sequencing technologies also permit the analysis of the entire mitochondrial genome at maximum resolution. The complete mtDNA sequence is now the common standard format for the investigation of human maternal lineages. A preliminary analysis of the complete goat mtDNA genome supports a single Neolithic origin of domestic goats rather than multiple domestication events in different geographic areas.