Richard Kuo from the Roslin Institute gave this PAG 2017 talk about using the PacBio Iso-Seq data to generate genome annotations that outperform current gold-standard annotations. Included: findings from a chicken study, the Iso-Seq pipeline, and why short reads are so problematic for understanding gene content.
Generating de novo reference genome assemblies for non-model organisms is a laborious task that often requires a large amount of data from several sequencing platforms and cytogenetic surveys. By using PacBio sequence data and new library creation techniques, we present a de novo, high quality reference assembly for the goat (Capra hircus) that demonstrates a primarily sequencing-based approach to efficiently create new reference assemblies for Eukaryotic species. This goat reference genome was created using 38 million PacBio P5-C3 reads generated from a San Clemente goat using the Celera Assembler PBcR pipeline with PacBio read self-correction. In order to generate the…
The goat (Capra hircus) remains an important livestock species due to the species’ ability to forage and provide milk, meat and wool in arid environments. The current goat reference assembly and annotation borrows heavily from other loosely related livestock species, such as cattle, and may not reflect the unique structural and functional characteristics of the species. We present preliminary data from a new de novo reference assembly for goat that primarily utilizes 38 million PacBio P5-C3 reads generated from an inbred San Clemente goat. This assembly consists of only 5,902 contigs with a contig N50 size of 2.56 megabases which…
Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Errors in the reference genome assembly increase the number of false-positives in downstream analyses. Recently, we have shown that over 33% of the current pig reference genome, Sscrofa10.2, is either misassembled or otherwise unreliable for genomic analyses. Additionally, ~10% of the bases in the assembly are Ns in gaps of an arbitrary size. Thousands of highly fragmented contigs remain unplaced and many genes are known to be missing from the assembly. Here we present a new assembly of the pig genome, Sscrofa11, assembled using 65X…
Determining compositions and functional capabilities of complex populations is often challenging, especially for sequencing technologies with short reads that do not uniquely identify organisms or genes. Long-read sequencing improves the resolution of these mixed communities, but adoption for this application has been limited due to concerns about throughput, cost and accuracy. The recently introduced PacBio Sequel System generates hundreds of thousands of long and highly accurate single-molecule reads per SMRT Cell. We investigated how the Sequel System might increase understanding of metagenomic communities. In the past, focus was largely on taxonomic classification with 16S rRNA sequencing. Recent expansion to WGS…
We present high quality, phased genome assemblies representative of taurine and indicine cattle, subspecies that differ markedly in productivity-related traits and environmental adaptation. We report a new haplotype-aware scaffolding and polishing pipeline using contigs generated by the trio binning method to produce haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle breeds. These assemblies were used to identify structural and copy number variants that differentiate the subspecies and we found variant detection was sensitive to the specific reference genome chosen. Six gene families with immune related functions are expanded in the indicine lineage. Assembly of the genomes of…
Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions…
The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model with high anatomical and immunological similarity to humans. The draft reference genome (Sscrofa10.2) represented a purebred female pig from a commercial pork production breed (Duroc), and was established using older clone-based sequencing methods. The Sscrofa10.2 assembly was incomplete and unresolved redundancies, short range order and orientation errors and associated misassembled genes limited its utility. We present two highly contiguous chromosome-level genome assemblies created with more recent long read technologies and a whole genome shotgun strategy, one for the same Duroc female (Sscrofa11.1) and…
The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline…
The antibody repertoire of Bos taurus is characterized by a subset of variable heavy (VH) chain regions with ultralong third complementarity determining regions (CDR3) which, compared to other species, can provide a potent response to challenging antigens like HIV env. These unusual CDR3 can range to over seventy highly diverse amino acids in length and form unique ß-ribbon ‘stalk’ and disulfide bonded ‘knob’ structures, far from the typical antigen binding site. The genetic components and processes for forming these unusual cattle antibody VH CDR3 are not well understood. Here we analyze sequences of Bos taurus antibody VH domains and find…
In cattle, the X chromosome accounts for approximately 3 and 6% of the genome in bulls and cows, respectively. In spite of the large size of this chromosome, very few studies report analysis of the X chromosome in genome-wide association studies and genomic selection. This lack of genetic interrogation is likely due to the complexities of undertaking these studies given the hemizygous state of some, but not all, of the X chromosome in males. The first step in facilitating analysis of this gene-rich chromosome is to accurately identify coordinates for the pseudoautosomal boundary (PAB) to split the chromosome into a…
The world demand for animal-based food products is anticipated to increase by 70% by 2050. Meeting this demand in a way that has a minimal impact on the environment will require the implementation of advanced technologies, and methods to improve the genetic quality of livestock are expected to play a large part. Over the past 10 years, genomic selection has been introduced in several major livestock species and has more than doubled genetic progress in some. However, additional improvements are required. Genomic information of increasing complexity (including genomic, epigenomic, transcriptomic and microbiome data), combined with technological advances for its cost-effective…
Bacillus subtilis is recognized as a safe and reliable human and animal probiotic and is associated with bioactivities such as production of vitamin and immune stimulation. Additionally, it has great potential to be used as an alternative to antimicrobial drugs, which is significant in the context of antibiotic abuse in food animal production. In this study, we isolated one strain of B. subtilis, named WS-1, from apparently healthy pigs growing with sick cohorts on one Escherichia coli endemic commercial pig farm in Guangdong, China. WS-1 can strongly inhibit the growth of pathogenic E. coli in vitro. The B. subtilis strain…
Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome…
Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig. Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were…