June 1, 2021  |  

An update on goat genomics

Goats are specialized in dairy, meat and fiber production, being adapted to a wide range of environmental conditions and having a large economic impact in developing countries. In the last years, there have been dramatic advances in the knowledge of the structure and diversity of the goat genome/transcriptome and in the development of genomic tools, rapidly narrowing the gap between goat and related species such as cattle and sheep. Major advances are: 1) publication of a de novo goat genome reference sequence; 2) Development of whole genome high density RH maps, and; 3) Design of a commercial 50K SNP array. Moreover, there are currently several projects aiming at improving current genomic tools and resources. An improved assembly of the goat genome using PacBio reads is being produced, and the design of new SNP arrays is being studied to accommodate the specific needs of this species in the context of very large scale genotyping projects (i.e. breed characterization at an international scale and genomic selection) and parentage analysis. As in other species, the focus has now turned to the identification of causative mutations underlying the phenotypic variation of traits. In addition, since 2014, the ADAPTmap project (www.goatadaptmap.org) has gathered data to explore the diversity of caprine populations at a worldwide scale by using a wide variety of approaches and data.

June 1, 2021  |  

A comprehensive lincRNA analysis: From conifers to trees

We have produced an updated annotation of the Norway spruce genome on the basis of an in siliconormalised set of RNA-Seq data obtained from 1,529 samples and comprising 15.5 billion paired-end Illumina HiSeq reads complemented by 18Mbp of PacBio cDNA data (3.2M sequences). In addition to augmenting and refining the previous protein coding gene annotation, here we focus on the addition of long intergenic non-coding RNA (lincRNA) and micro RNA (miRNA) genes. In addition to non-coding loci, our analyses also identified protein coding genes that had been missed by the initial genome annotation and enabled us to update the annotation of existing gene models. In particular, splice variant information, as supported by PacBio sequencing reads, has been added to the current annotation and previously fragmented gene models have been merged by scaffolding disjoint genomic scaffolds on the basis of transcript evidence. Using this refined annotation, a targeted analysis of the lincRNAs enabled their classification as i) deeply conserved, ii) conserved in seed plants iii) gymnosperm/conifer specific. Concurrently, complementary analyses were performed as part of the aspen genome project and the results of a comparative analysis of the lincRNAs conserved in both Norway spruce and Eurasian aspen enabled us to identify conserved and diverged expression profiles. At present, we are delving further into the expression results with the aim to functionally annotate the lincRNA genes, by developing a co-expression network analyses based GO annotation.

June 1, 2021  |  

Resolving KIR genotypes and haplotypes simultaneously using Single Molecule, Real-Time Sequencing

The killer immunoglobulin-like receptors (KIR) genes belong to the immunoglobulin superfamily and are widely studied due to the critical role they play in coordinating the innate immune response to infection and disease. Highly accurate, contiguous, long reads, like those generated by SMRT Sequencing, when combined with target-enrichment protocols, provide a straightforward strategy for generating complete de novo assembled KIR haplotypes. We have explored two different methods to capture the KIR region; one applying the use of fosmid clones and one using Nimblegen capture.

June 1, 2021  |  

Whole gene sequencing of KIR-3DL1 with SMRT Sequencing and the distribution of allelic variants in different ethnic groups

The killer-cell immunoglobulin-like receptor (KIR) gene family are involved in immune modulation during viral infection, autoimmune disease and in allogeneic stem cell transplantation. Most KIR gene diversity studies and their impact on the transplant outcome is performed by gene absence/presence assays. However, it is well known that KIR gene allelic variations have biological significance. Allele level typing of KIR genes has been very challenging until recently due to the homologous nature of those genes and very long intronic sequences. SMRT (Single Molecule Real-Time) Sequencing generates average long reads of 10 to 15 kb and allows us to obtain in-phase long sequence reads. We have developed a PCR assay for SMRT Sequencing on the PacBio RS II platform in our lab for 3DL1 whole gene sequencing. This approach allows us to obtain allele level typing for 3DL1 genes and could serve as a model to type other KIR genes at allelic level.

June 1, 2021  |  

The MHC Diversity in Africa Project (MDAP) pilot – 125 African high resolution HLA types from 5 populations

The major histocompatibility complex (MHC), or human leukocyte antigen (HLA) in humans, is a highly diverse gene family with a key role in immune response to disease; and has been implicated in auto-immune disease, cancer, infectious disease susceptibility, and vaccine response. It has clinical importance in the field of solid organ and bone marrow transplantation, where donors and recipient matching of HLA types is key to transplanted organ outcomes. The Sanger based typing (SBT) methods currently used in clinical practice do not capture the full diversity across this region, and require specific reference sequences to deconvolute ambiguity in HLA types. However, reference databases are based largely on European populations, and the full extent of diversity in Africa remains poorly understood. Here, we present the first systematic characterisation of HLA diversity within Africa in the pilot phase of the MHC Diversity in Africa Project, together with an evaluation of methods to carry out scalable cost-effective, as well as reliable, typing of this region in African populations.To sample a geographically representative panel of African populations we obtained 125 samples, 25 each from the Zulu (South Africa), Igbo (Nigeria), Kalenjin (Kenya), Moroccan and Ashanti (Ghana) groups. For methods validation we included two controls from the International Histocompatibility Working Group (IHWG) collection with known typing information. Sanger typing and Illumina HiSeq X sequencing of these samples indicated potentially novel Class I and Class II alleles; however, we found poor correlation between HiSeq X sequencing and SBT for both classes. Long Range PCR and high resolution PacBio RS-II typing of 4 of these samples identified 7 novel Class II alleles, highlighting the high levels of diversity in these populations, and the need for long read sequencing approaches to characterise this comprehensively. We have now expanded this approach to the entire pilot set of 125 samples. We present these confirmed types and discuss a workflow for scaling this to 5000 individuals across Africa.The large number of new alleles identified in our pilot suggests the high level of African HLA diversity and the utility of high resolution methods. The MDAP project will provide a framework for accurate HLA typing, in addition to providing an invaluable resource for imputation in GWAS, boosting power to identify and resolve HLA disease associations.

June 1, 2021  |  

T-cell receptor profiling using PacBio sequencing of SMARTer libraries

T-cells play a central part in the immune response in humans and related species. T-cell receptors (TCRs), heterodimers located on the T-cell surface, specifically bind foreign antigens displayed on the MHC complex of antigen-presenting cells. The wide spectrum of potential antigens is addressed by the diversity of TCRs created by V(D)J recombination. Profiling this repertoire of TCRs could be useful from, but not limited to, diagnosis, monitoring response to treatments, and examining T-cell development and diversification.

April 21, 2020  |  

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.