Menu
September 22, 2019

Human and rhesus macaque KIR haplotypes defined by their transcriptomes.

The killer-cell Ig-like receptors (KIRs) play a central role in the immune recognition in infection, pregnancy, and transplantation through their interactions with MHC class I molecules. KIR genes display abundant copy number variation as well as high levels of polymorphism. As a result, it is challenging to characterize this structurally dynamic region. KIR haplotypes have been analyzed in different species using conventional characterization methods, such as Sanger sequencing and Roche/454 pyrosequencing. However, these methods are time-consuming and often failed to define complete haplotypes, or do not reach allele-level resolution. In addition, most analyses were performed on genomic DNA, and thus were lacking substantial information about transcription and its corresponding modifications. In this paper, we present a single-molecule real-time sequencing approach, using Pacific Biosciences Sequel platform to characterize the KIR transcriptomes in human and rhesus macaque (Macaca mulatta) families. This high-resolution approach allowed the identification of novel Mamu-KIR alleles, the extension of reported allele sequences, and the determination of human and macaque KIR haplotypes. In addition, multiple recombinant KIR genes were discovered, all located on contracted haplotypes, which were likely the result of chromosomal rearrangements. The relatively high number of contracted haplotypes discovered might be indicative of selection on small KIR repertoires and/or novel fusion gene products. This next-generation method provides an improved high-resolution characterization of the KIR cluster in humans and macaques, which eventually may aid in a better understanding and interpretation of KIR allele-associated diseases, as well as the immune response in transplantation and reproduction. Copyright © 2018 by The American Association of Immunologists, Inc.


September 22, 2019

A carnivorous plant genetic map: pitcher/insect-capture QTL on a genetic linkage map of Sarracenia.

The study of carnivorous plants can afford insight into their unique evolutionary adaptations and their interactions with prokaryotic and eukaryotic species. For Sarracenia (pitcher plants), we identified 64 quantitative trait loci (QTL) for insect-capture traits of the pitchers, providing the genetic basis for differences between the pitfall and lobster-trap strategies of insect capture. The linkage map developed here is based upon the F2 of a cross between Sarracenia rosea and Sarracenia psittacina; we mapped 437 single nucleotide polymorphism and simple sequence repeat markers. We measured pitcher traits which differ between S. rosea and S. psittacina, mapping 64 QTL for 17 pitcher traits; there are hot-spot locations where multiple QTL map near each other. There are epistatic interactions in many cases where there are multiple loci for a trait. The QTL map uncovered the genetic basis for the differences between pitfall- and lobster-traps, and the changes that occurred during the divergence of these species. The longevity and clonability of Sarracenia plants make the F2 mapping population a resource for mapping more traits and for phenotype-to-genotype studies.


September 22, 2019

Emergence, retention and selection: A trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates.

While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.


September 22, 2019

De novo assembly and characterizing of the culm-derived meta-transcriptome from the polyploid sugarcane genome based on coding transcripts

Sugarcane biomass has been used for sugar, bioenergy and biomaterial production. The majority of the sugarcane biomass comes from the culm, which makes it important to understand the genetic control of biomass production in this part of the plant. A meta-transcriptome of the culm was obtained in an earlier study by using about one billion paired-end (150 bp) reads of deep RNA sequencing of samples from 20 diverse sugarcane genotypes and combining de novo assemblies from different assemblers and different settings. Although many genes could be recovered, this resulted in a large combined assembly which created the need for clustering to reduce transcript redundancy while maintaining gene content. Here, we present a comprehensive analysis of the effect of different assembly settings and clustering methods on de novo assembly, annotation and transcript profiling focusing especially on the coding transcripts from the highly polyploid sugarcane genome. The new coding sequence-based transcript clustering resulted in a better representation of transcripts compared to the earlier approach, having 121,987 contigs, which included 78,052 main and 43,935 alternative transcripts. About 73%, 67%, 61% and 10% of the transcriptome was annotated against the NCBI NR protein database, GO terms, orthologous groups and KEGG orthologies, respectively. Using this set for a differential gene expression analysis between the young and mature sugarcane culm tissues, a total of 822 transcripts were found to be differentially expressed, including key transcripts involved in sugar/fiber accumulation in sugarcane. In the context of the lack of a whole genome sequence for sugarcane, the availability of a well annotated culm-derived meta-transcriptome through deep sequencing provides useful information on coding genes specific to the sugarcane culm and will certainly contribute to understanding the process of carbon partitioning, and biomass accumulation in the sugarcane culm.


September 22, 2019

Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome.

The majority of microbial genomic diversity remains unexplored. This is largely due to our inability to culture most microorganisms in isolation, which is a prerequisite for traditional genome sequencing. Single-cell sequencing has allowed researchers to circumvent this limitation. DNA is amplified directly from a single cell using the whole-genome amplification technique of multiple displacement amplification (MDA). However, MDA from a single chromosome copy suffers from amplification bias and a large loss of specificity from even very small amounts of DNA contamination, which makes assembling a genome difficult and completely finishing a genome impossible except in extraordinary circumstances. Gel microdrop cultivation allows culturing of a diverse microbial community and provides hundreds to thousands of genetically identical cells as input for an MDA reaction. We demonstrate the utility of this approach by comparing sequencing results of gel microdroplets and single cells following MDA. Bias is reduced in the MDA reaction and genome sequencing, and assembly is greatly improved when using gel microdroplets. We acquired multiple near-complete genomes for two bacterial species from human oral and stool microbiome samples. A significant amount of genome diversity, including single nucleotide polymorphisms and genome recombination, is discovered. Gel microdroplets offer a powerful and high-throughput technology for assembling whole genomes from complex samples and for probing the pan-genome of naturally occurring populations.


September 22, 2019

Long-read DNA metabarcoding of ribosomal RNA in the analysis of fungi from aquatic environments.

DNA metabarcoding is widely used to study prokaryotic and eukaryotic microbial diversity. Technological constraints limit most studies to marker lengths below 600 base pairs (bp). Longer sequencing reads of several thousand bp are now possible with third-generation sequencing. Increased marker lengths provide greater taxonomic resolution and allow for phylogenetic methods of classification, but longer reads may be subject to higher rates of sequencing error and chimera formation. In addition, most bioinformatics tools for DNA metabarcoding were designed for short reads and are therefore unsuitable. Here, we used Pacific Biosciences circular consensus sequencing (CCS) to DNA-metabarcode environmental samples using a ca. 4,500 bp marker that included most of the eukaryote SSU and LSU rRNA genes and the complete ITS region. We developed an analysis pipeline that reduced error rates to levels comparable to short-read platforms. Validation using a mock community indicated that our pipeline detected 98% of chimeras de novo. We recovered 947 OTUs from water and sediment samples from a natural lake, 848 of which could be classified to phylum, 397 to genus and 330 to species. By allowing for the simultaneous use of three databases (Unite, SILVA and RDP LSU), long-read DNA metabarcoding provided better taxonomic resolution than any single marker. We foresee the use of long reads enabling the cross-validation of reference sequences and the synthesis of ribosomal rRNA gene databases. The universal nature of the rRNA operon and our recovery of >100 nonfungal OTUs indicate that long-read DNA metabarcoding holds promise for studies of eukaryotic diversity more broadly.© 2018 John Wiley & Sons Ltd.


September 22, 2019

Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.

The large and complex hexaploid genome has greatly hindered genomics studies of common wheat (Triticum aestivum, AABBDD). Here, we investigated transcripts in common wheat developing caryopses using the emerging single-molecule real-time (SMRT) sequencing technology PacBio RSII, and assessed the resultant data for improving common wheat genome annotation and grain transcriptome research.We obtained 197,709 full-length non-chimeric (FLNC) reads, 74.6 % of which were estimated to carry complete open reading frame. A total of 91,881 high-quality FLNC reads were identified and mapped to 16,188 chromosomal loci, corresponding to 13,162 known genes and 3026 new genes not annotated previously. Although some FLNC reads could not be unambiguously mapped to the current draft genome sequence, many of them are likely useful for studying highly similar homoeologous or paralogous loci or for improving chromosomal contig assembly in further research. The 91,881 high-quality FLNC reads represented 22,768 unique transcripts, 9591 of which were newly discovered. We found 180 transcripts each spanning two or three previously annotated adjacent loci, suggesting that they should be merged to form correct gene models. Finally, our data facilitated the identification of 6030 genes differentially regulated during caryopsis development, and full-length transcripts for 72 transcribed gluten gene members that are important for the end-use quality control of common wheat.Our work demonstrated the value of PacBio transcript sequencing for improving common wheat genome annotation through uncovering the loci and full-length transcripts not discovered previously. The resource obtained may aid further structural genomics and grain transcriptome studies of common wheat.


September 22, 2019

The discovered chimeric protein plays the cohesive role to maintain scallop byssal root structural integrity.

Adhesion is essential for many marine sessile organisms. Unraveling the compositions and assembly of marine bioadheisves is the fundamental to understand their physiological roles. Despite the remarkable diversity of animal bioadhesion, our understanding of this biological process remains limited to only a few animal lineages, leaving the majority of lineages remain enigmatic. Our previous study demonstrated that scallop byssus had distinct protein composition and unusual assembly mechanism apart from mussels. Here a novel protein (Sbp9) was discovered from the key part of the byssus (byssal root), which contains two Calcium Binding Domain (CBD) and 49 tandem Epidermal Growth Factor-Like (EGFL) domain repeats. Modular architecture of Sbp9 represents a novel chimeric gene family resulting from a gene fusion event through the acquisition of CBD2 domain by tenascin like (TNL) gene from Na+/Ca2+ exchanger 1 (NCX1) gene. Finally, free thiols are present in Sbp9 and the results of a rescue assay indicated that Sbp9 likely plays the cohesive role for byssal root integrity. This study not only aids our understanding of byssus assembly but will also inspire biomimetic material design.


September 22, 2019

Whole genome sequencing of “Faecalibaculum rodentium” ALO17, isolated from C57BL/6J laboratory mouse feces.

Intestinal microorganisms affect host physiology, including ageing. Given the difficulty in controlling for human studies of the gut microbiome, mouse models provide an alternative avenue to study such relationships. In this study, we report on the complete genome of “Faecalibaculum rodentium” ALO17, a bacterium that was isolated from the faeces of a 9-month-old female C57BL/6J mouse. This strain will be utilized in future in vivo studies detailing the relationships between the gut microbiome and ageing.The whole genome sequence of “F. rodentium” ALO17 was obtained using single-molecule, real-time (SMRT) technique on a PacBio instrument. The assembled genome consisted of 2,542,486 base pairs of double-stranded DNA with a GC content of 54.0 % and no plasmids. The genome was predicted to contain 2794 open reading frames, 55 tRNA genes, and 38 rRNA genes. The 16S rRNA gene of ALO17 was 86.9 % similar to that of Allobaculum stercoricanis DSM 13633(T), and the average overall nucleotide identity between strains ALO17 and DSM 13633(T) was 66.8 %. After confirming the phylogenetic relationship between “F. rodentium” ALO17 and A. stercoricanis DSM 13633(T), their whole genome sequences were compared, revealing that “F. rodentium” ALO17 contains more fermentation-related genes than A. stercoricanis DSM 13633(T). Furthermore, “F. rodentium” ALO17 produces higher levels of lactic acid than A. stercoricanis DSM 13633(T) as determined by high-performance liquid chromatography.The availability of the “F. rodentium” ALO17 whole genome sequence will enhance studies concerning the gut microbiota and host physiology, especially when investigating the molecular relationships between gut microbiota and ageing.


September 22, 2019

Analysis of the gut microbial diversity of dairy cows during peak lactation by PacBio Single-Molecule Real-Time (SMRT) Sequencing.

The gut microbes of dairy cows are strongly associated with their health, but the relationship between milk production and the intestinal microbiota has seldom been studied. Thus, we explored the diversity of the intestinal microbiota during peak lactation of dairy cows.The intestinal microbiota of nine dairy cows at peak lactation was evaluated using the Pacific Biosciences single-molecule real-time (PacBio SMRT) sequencing approach.A total of 32,670 high-quality 16S rRNA gene sequences were obtained, belonging to 12 phyla, 59 families, 107 genera, and 162 species. Firmicutes (83%) were the dominant phylum, while Bacteroides (6.16%) was the dominant genus. All samples showed a high microbial diversity, with numerous genera of short chain fatty acid (SCFA)-producers. The proportion of SCFA producers was relatively high in relation to the identified core intestinal microbiota. Moreover, the predicted functional metagenome was heavily involved in energy metabolism.This study provided novel insights into the link between the dairy cow gut microbiota and milk production.


September 22, 2019

Ensembl 2018

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.


September 22, 2019

Quantitative isoform-profiling of highly diversified recognition molecules.

Complex biological systems rely on cell surface cues that govern cellular self-recognition and selective interactions with appropriate partners. Molecular diversification of cell surface recognition molecules through DNA recombination and complex alternative splicing has emerged as an important principle for encoding such interactions. However, the lack of tools to specifically detect and quantify receptor protein isoforms is a major impediment to functional studies. We here developed a workflow for targeted mass spectrometry by selected reaction monitoring (SRM) that permits quantitative assessment of highly diversified protein families. We apply this workflow to dissecting the molecular diversity of the neuronal neurexin receptors and uncover an alternative splicing-dependent recognition code for synaptic ligands.


September 22, 2019

Mutational landscape of antibody variable domains reveals a switch modulating the interdomain conformational dynamics and antigen binding.

Somatic mutations within the antibody variable domains are critical to the immense capacity of the immune repertoire. Here, via a deep mutational scan, we dissect how mutations at all positions of the variable domains of a high-affinity anti-VEGF antibody G6.31 impact its antigen-binding function. The resulting mutational landscape demonstrates that large portions of antibody variable domain positions are open to mutation, and that beneficial mutations can be found throughout the variable domains. We determine the role of one antigen-distal light chain position 83, demonstrating that mutation at this site optimizes both antigen affinity and thermostability by modulating the interdomain conformational dynamics of the antigen-binding fragment. Furthermore, by analyzing a large number of human antibody sequences and structures, we demonstrate that somatic mutations occur frequently at position 83, with corresponding domain conformations observed for G6.31. Therefore, the modulation of interdomain dynamics represents an important mechanism during antibody maturation in vivo.


September 22, 2019

Novel full-length major histocompatibility complex class I allele discovery and haplotype definition in pig-tailed macaques.

Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.


September 22, 2019

Sequence motifs associated with paternal transmission of mitochondrial DNA in the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae).

In the majority of metazoans paternal mitochondria represent evolutionary dead-ends. In many bivalves, however, this paradigm does not hold true; both maternal and paternal mitochondria are inherited. Herein, we characterize maternal and paternal mitochondrial control regions of the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae). The maternal control region is 808bp long, while the paternal control region is longer at 2.3kb. We hypothesize that the size difference is due to a combination of repeated duplications within the control region of the paternal mtDNA genome, as well as an evolutionarily ancient recombination event between two sex-associated mtDNA genomes that led to the insertion of a second control region sequence in the genome that is now transmitted via males. In a comparison to other mytilid male control regions, we identified two evolutionarily Conserved Motifs, CMA and CMB, associated with paternal transmission of mitochondrial DNA. CMA is characterized by a conserved purine/pyrimidine pattern, while CMB exhibits a specific 13bp nucleotide string within a stem and loop structure. The identification of motifs CMA and CMB in M. modiolus extends our understanding of Sperm Transmission Elements (STEs) that have recently been identified as being associated with the paternal transmission of mitochondria in marine bivalves. Copyright © 2017 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.