Duke University Archives - Page 2 of 8

September 22, 2019 |

Bayesian nonparametric discovery of isoforms and individual specific quantification.

Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.

September 22, 2019 |

Candidatus Dactylopiibacterium carminicum, a nitrogen-fixing symbiont of Dactylopius cochineal insects (Hemiptera: Coccoidea: Dactylopiidae)

The domesticated carmine cochineal Dactylopius coccus (scale insect) has commercial value and has been used for more than 500?years for natural red pigment production. Besides the domesticated cochineal, other wild Dactylopius species such as Dactylopius opuntiae are found in the Americas, all feeding on nutrient poor sap from native cacti. To compensate nutritional deficiencies, many insects harbor symbiotic bacteria which provide essential amino acids or vitamins to their hosts. Here, we characterized a symbiont from the carmine cochineal insects, Candidatus Dactylopiibacterium carminicum (betaproteobacterium, Rhodocyclaceae family) and found it in D. coccus and in D. opuntiae ovaries by fluorescent in situ hybridization, suggesting maternal inheritance. Bacterial genomes recovered from metagenomic data derived from whole insects or tissues both from D. coccus and from D. opuntiae were around 3.6?Mb in size. Phylogenomics showed that dactylopiibacteria constituted a closely related clade neighbor to nitrogen fixing bacteria from soil or from various plants including rice and other grass endophytes. Metabolic capabilities were inferred from genomic analyses, showing a complete operon for nitrogen fixation, biosynthesis of amino acids and vitamins and putative traits of anaerobic or microoxic metabolism as well as genes for plant interaction. Dactylopiibacterium nif gene expression and acetylene reduction activity detecting nitrogen fixation were evidenced in D. coccus hemolymph and ovaries, in congruence with the endosymbiont fluorescent in situ hybridization location. Dactylopiibacterium symbionts may compensate for the nitrogen deficiency in the cochineal diet. In addition, this symbiont may provide essential amino acids, recycle uric acid, and increase the cochineal life span.

September 22, 2019 |

Next-generation approaches to advancing eco-immunogenomic research in critically endangered primates.

High-throughput sequencing platforms are generating massive amounts of genomic data from nonmodel species, and these data sets are valuable resources that can be mined to advance a number of research areas. An example is the growing amount of transcriptome data that allow for examination of gene expression in nonmodel species. Here, we show how publicly available transcriptome data from nonmodel primates can be used to design novel research focused on immunogenomics. We mined transcriptome data from the world’s most endangered group of primates, the lemurs of Madagascar, for sequences corresponding to immunoglobulins. Our results confirmed homology between strepsirrhine and haplorrhine primate immunoglobulins and allowed for high-throughput sequencing of expressed antibodies (Ig-seq) in Coquerel’s sifaka (Propithecus coquereli). Using both Pacific Biosciences RS and Ion Torrent PGM sequencing, we performed Ig-seq on two individuals of Coquerel’s sifaka. We generated over 150 000 sequences of expressed antibodies, allowing for molecular characterization of the antigen-binding region. Our analyses suggest that similar VDJ expression patterns exist across all primates, with sequences closely related to the human VH 3 immunoglobulin family being heavily represented in sifaka antibodies. Moreover, the antigen-binding region of sifaka antibodies exhibited similar amino acid variation with respect to haplorrhine primates. Our study represents the first attempt to characterize sequence diversity of the expressed antibody repertoire in a species of lemur. We anticipate that methods similar to ours will provide the framework for investigating the adaptive immune response in wild populations of other nonmodel organisms and can be used to advance the burgeoning field of eco-immunology. © 2014 John Wiley & Sons Ltd.

September 22, 2019 |

Hybrid error correction and de novo assembly of single-molecule sequencing reads.

Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.

September 22, 2019 |

Next generation multilocus sequence typing (NGMLST) and the analytical software program MLSTEZ enable efficient, cost-effective, high-throughput, multilocus sequencing typing.

Multilocus sequence typing (MLST) has become the preferred method for genotyping many biological species, and it is especially useful for analyzing haploid eukaryotes. MLST is rigorous, reproducible, and informative, and MLST genotyping has been shown to identify major phylogenetic clades, molecular groups, or subpopulations of a species, as well as individual strains or clones. MLST molecular types often correlate with important phenotypes. Conventional MLST involves the extraction of genomic DNA and the amplification by PCR of several conserved, unlinked gene sequences from a sample of isolates of the taxon under investigation. In some cases, as few as three loci are sufficient to yield definitive results. The amplicons are sequenced, aligned, and compared by phylogenetic methods to distinguish statistically significant differences among individuals and clades. Although MLST is simpler, faster, and less expensive than whole genome sequencing, it is more costly and time-consuming than less reliable genotyping methods (e.g. amplified fragment length polymorphisms). Here, we describe a new MLST method that uses next-generation sequencing, a multiplexing protocol, and appropriate analytical software to provide accurate, rapid, and economical MLST genotyping of 96 or more isolates in single assay. We demonstrate this methodology by genotyping isolates of the well-characterized, human pathogenic yeast Cryptococcus neoformans. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

September 22, 2019 |

High-resolution expression map of the Arabidopsis root reveals alternative splicing and lincRNA regulation.

The extent to which alternative splicing and long intergenic noncoding RNAs (lincRNAs) contribute to the specialized functions of cells within an organ is poorly understood. We generated a comprehensive dataset of gene expression from individual cell types of the Arabidopsis root. Comparisons across cell types revealed that alternative splicing tends to remove parts of coding regions from a longer, major isoform, providing evidence for a progressive mechanism of splicing. Cell-type-specific intron retention suggested a possible origin for this common form of alternative splicing. Coordinated alternative splicing across developmental stages pointed to a role in regulating differentiation. Consistent with this hypothesis, distinct isoforms of a transcription factor were shown to control developmental transitions. lincRNAs were generally lowly expressed at the level of individual cell types, but co-expression clusters provided clues as to their function. Our results highlight insights gained from analysis of expression at the level of individual cell types. Copyright © 2016 Elsevier Inc. All rights reserved.

September 22, 2019 |

Meeting report: processing, translation, decay – three ways to keep RNA sizzling.

This meeting report highlights key trends that emerged from a conference entitled Post-Transcriptional Gene Regulation in Plants, which was held 14-15 July 2016, as a satellite meeting of the annual meeting of the American Society of Plant Biologists in Austin, Texas. The molecular biology of RNA is emerging as an integral part of the framework for plants’ responses to environmental challenges such as drought and heat, hypoxia, nutrient deprivation, light and pathogens. Moreover, the conference illustrated how a multitude of customized and pioneering omics-related technologies are being applied, more and more often in combination, to describe and dissect the complexities of gene expression at the post-transcriptional level.© 2016 John Wiley & Sons Ltd.

September 22, 2019 |

Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire.

Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Next generation sequencing methods provide unique approaches to a number of immuno-based research areas including antibody discovery and engineering, disease surveillance, and host immune response to vaccines. In particular, single-molecule circular consensus sequencing permits the sequencing of antibody repertoires at previously unattainable depths of coverage and accuracy. We approached the bovine immunoglobulin G (IgG) repertoire with the objective of characterizing diversity of expressed IgG transcripts. Here we present single-molecule real-time sequencing data of expressed IgG heavy-chain repertoires of four individual cattle. We describe the diversity observed within antigen binding regions and visualize this diversity using a network-based approach.We generated 49,945 high quality cDNA sequences, each spanning the entire IgG variable region from four Bos taurus calves. From these sequences we identified 49,521 antigen binding regions using the automated Paratome web server. Approximately 9% of all unique complementarity determining 2 (CDR2) sequences were of variable lengths. A bimodal distribution of unique CDR3 sequence lengths was observed, with common lengths of 5-6 and 21-25 amino acids. The average number of cysteine residues in CDR3s increased with CDR3 length and we observed that cysteine residues were centrally located in CDR3s. We identified 19 extremely long CDR3 sequences (up to 62 amino acids in length) within IgG transcripts. Network analyses revealed distinct patterns among the expressed IgG antigen binding repertoires of the examined individuals.We utilized circular consensus sequencing technology to provide baseline data of the expressed bovine IgG repertoire that can be used for future studies important to livestock research. Somatic mutation resulting in base insertions and deletions in CDR2 further diversifies the bovine antibody repertoire. In contrast to previous studies, our data indicate that unusually long CDR3 sequences are not unique to IgM antibodies in cattle. Centrally located cysteine residues in bovine CDR3s provide further evidence that disulfide bond formation is likely of structural importance. We hypothesize that network or cluster-based analyses of expressed antibody repertoires from controlled challenge experiments will help identify novel natural antigen binding solutions to specific pathogens of interest.

September 22, 2019 |

Draft genomes of two sordariomycete fungi that produce novel secondary metabolites.

The genomes of two fungi isolated from soil (MEA-2) and sediment (SUP5-1) were sequenced. Both were members of the order Hypocreales, closely related to Tolypocladium inflatum, and capable of producing novel secondary metabolites. The draft genomes enabled the characterization of key biosynthetic pathways. Copyright © 2015 Stamps et al.

September 22, 2019 |

The Florida manatee (Trichechus manatus latirostris) immunoglobulin heavy chain suggests the importance of clan III variable segments in repertoire diversity.

Manatees are a vulnerable, charismatic sentinel species from the evolutionarily divergent Afrotheria. Manatee health and resistance to infectious disease is of great concern to conservation groups, but little is known about their immune system. To develop manatee-specific tools for monitoring health, we first must have a general knowledge of how the immunoglobulin heavy (IgH) chain locus is organized and transcriptionally expressed. Using the genomic scaffolds of the Florida manatee (Trichechus manatus latirostris), we characterized the potential IgH segmental diversity and constant region isotypic diversity and performed the first Afrotherian repertoire analysis. The Florida manatee has low V(D)J combinatorial diversity (3744 potential combinations) and few constant region isotypes. They also lack clan III V segments, which may have caused reduced VH segment numbers. However, we found productive somatic hypermutation concentrated in the complementarity determining regions. In conclusion, manatees have limited IGHV clan and combinatorial diversity. This suggests that clan III V segments are essential for maintaining IgH locus diversity. Copyright © 2017 Elsevier Ltd. All rights reserved.

September 22, 2019 |

Microbiome and infectivity studies reveal complex polyspecies tree disease in Acute Oak Decline.

Decline-diseases are complex and becoming increasingly problematic to tree health globally. Acute Oak Decline (AOD) is characterized by necrotic stem lesions and galleries of the bark-boring beetle, Agrilus biguttatus, and represents a serious threat to oak. Although multiple novel bacterial species and Agrilus galleries are associated with AOD lesions, the causative agent(s) are unknown. The AOD pathosystem therefore provides an ideal model for a systems-based research approach to address our hypothesis that AOD lesions are caused by a polymicrobial complex. Here we show that three bacterial species, Brenneria goodwinii, Gibbsiella quercinecans and Rahnella victoriana, are consistently abundant in the lesion microbiome and possess virulence genes used by canonical phytopathogens that are expressed in AOD lesions. Individual and polyspecies inoculations on oak logs and trees demonstrated that B. goodwinii and G. quercinecans cause tissue necrosis and, in combination with A. biguttatus, produce the diagnostic symptoms of AOD. We have proved a polybacterial cause of AOD lesions, providing new insights into polymicrobial interactions and tree disease. This work presents a novel conceptual and methodological template for adapting Koch’s postulates to address the role of microbial communities in disease.

September 22, 2019 |

Blood CXCR3+CD4 T cells are enriched in inducible replication competent HIV in aviremic antiretroviral therapy-treated individuals.

We recently demonstrated that lymph nodes (LNs) PD-1+/T follicular helper (Tfh) cells from antiretroviral therapy (ART)-treated HIV-infected individuals were enriched in cells containing replication competent virus. However, the distribution of cells containing inducible replication competent virus has been only partially elucidated in blood memory CD4 T-cell populations including the Tfh cell counterpart circulating in blood (cTfh). In this context, we have investigated the distribution of (1) total HIV-infected cells and (2) cells containing replication competent and infectious virus within various blood and LN memory CD4 T-cell populations of conventional antiretroviral therapy (cART)-treated HIV-infected individuals. In the present study, we show that blood CXCR3-expressing memory CD4 T cells are enriched in cells containing inducible replication competent virus and contributed the most to the total pool of cells containing replication competent and infectious virus in blood. Interestingly, subsequent proviral sequence analysis did not indicate virus compartmentalization between blood and LN CD4 T-cell populations, suggesting dynamic interchanges between the two compartments. We then investigated whether the composition of blood HIV reservoir may reflect the polarization of LN CD4 T cells at the time of reservoir seeding and showed that LN PD-1+CD4 T cells of viremic untreated HIV-infected individuals expressed significantly higher levels of CXCR3 as compared to CCR4 and/or CCR6, suggesting that blood CXCR3-expressing CD4 T cells may originate from LN PD-1+CD4 T cells. Taken together, these results indicate that blood CXCR3-expressing CD4 T cells represent the major blood compartment containing inducible replication competent virus in treated aviremic HIV-infected individuals.

September 22, 2019 |

Complete genome sequence of N2-fixing model strain Klebsiella sp. nov. M5al, which produces plant cell wall-degrading enzymes and siderophores.

The bacterial strain M5al is a model strain for studying the molecular genetics of N2-fixation and molecular engineering of microbial production of platform chemicals 1,3-propanediol and 2,3-butanediol. Here, we present the complete genome sequence of the strain M5al, which belongs to a novel species closely related toKlebsiella michiganensis. M5al secretes plant cell wall-degrading enzymes and colonizes rice roots but does not cause soft rot disease. M5al also produces siderophores and contains the gene clusters for synthesis and transport of yersiniabactin which is a critical virulence factor forKlebsiellapathogens in causing human disease. We propose that the model strain M5al can be genetically modified to study bacterial N2-fixation in association with non-legume plants and production of 1,3-propanediol and 2,3-butanediol through degradation of plant cell wall biomass.

September 22, 2019 |

Rapid allopolyploid radiation of moonwort ferns (Botrychium; Ophioglossaceae) revealed by PacBio sequencing of homologous and homeologous nuclear regions.

Polyploidy is a major speciation process in vascular plants, and is postulated to be particularly important in shaping the diversity of extant ferns. However, limitations in the availability of bi-parental markers for ferns have greatly limited phylogenetic investigation of polyploidy in this group. With a large number of allopolyploid species, the genus Botrychium is a classic example in ferns where recurrent polyploidy is postulated to have driven frequent speciation events. Here, we use PacBio sequencing and the PURC bioinformatics pipeline to capture all homeologous or allelic copies of four long (~1?kb) low-copy nuclear regions from a sample of 45 specimens (25 diploids and 20 polyploids) representing 37 Botrychium taxa, and three outgroups. This sample includes most currently recognized Botrychium species in Europe and North America, and the majority of our specimens were genotyped with co-dominant nuclear allozymes to ensure species identification. We analyzed the sequence data using maximum likelihood (ML) and Bayesian inference (BI) concatenated-data (“gene tree”) approaches to explore the relationships among Botrychium species. Finally, we estimated divergence times among Botrychium lineages and inferred the multi-labeled polyploid species tree showing the origins of the polyploid taxa, and their relationships to each other and to their diploid progenitors. We found strong support for the monophyly of the major lineages within Botrychium and identified most of the parental donors of the polyploids; these results largely corroborate earlier morphological and allozyme-based investigations. Each polyploid had at least two distinct homeologs, indicating that all sampled polyploids are likely allopolyploids (rather than autopolyploids). Our divergence-time analyses revealed that these allopolyploid lineages originated recently-within the last two million years-and thus that the genus has undergone a recent radiation, correlated with multiple independent allopolyploidizations across the phylogeny. Also, we found strong parental biases in the formation of allopolyploids, with individual diploid species participating multiple times as either the maternal or paternal donor (but not both). Finally, we discuss the role of polyploidy in the evolutionary history of Botrychium and the interspecific reproductive barriers possibly involved in these parental biases. Copyright © 2017 Elsevier Inc. All rights reserved.

September 22, 2019 |

Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish.

The ballan wrasse (Labrus bergylta) belongs to a large teleost family containing more than 600 species showing several unique evolutionary traits such as lack of stomach and hermaphroditism. Agastric fish are found throughout the teleost phylogeny, in quite diverse and unrelated lineages, indicating stomach loss has occurred independently multiple times in the course of evolution. By assembling the ballan wrasse genome and transcriptome we aimed to determine the genetic basis for its digestive system function and appetite regulation. Among other, this knowledge will aid the formulation of aquaculture diets that meet the nutritional needs of agastric species.Long and short read sequencing technologies were combined to generate a ballan wrasse genome of 805 Mbp. Analysis of the genome and transcriptome assemblies confirmed the absence of genes that code for proteins involved in gastric function. The gene coding for the appetite stimulating protein ghrelin was also absent in wrasse. Gene synteny mapping identified several appetite-controlling genes and their paralogs previously undescribed in fish. Transcriptome profiling along the length of the intestine found a declining expression gradient from the anterior to the posterior, and a distinct expression profile in the hind gut.We showed gene loss has occurred for all known genes related to stomach function in the ballan wrasse, while the remaining functions of the digestive tract appear intact. The results also show appetite control in ballan wrasse has undergone substantial changes. The loss of ghrelin suggests that other genes, such as motilin, may play a ghrelin like role. The wrasse genome offers novel insight in to the evolutionary traits of this large family. As the stomach plays a major role in protein digestion, the lack of genes related to stomach digestion in wrasse suggests it requires formulated diets with higher levels of readily digestible protein than those for gastric species.

Auto Tag: Duke University

Bayesian nonparametric discovery of isoforms and individual specific quantification.

Candidatus Dactylopiibacterium carminicum, a nitrogen-fixing symbiont of Dactylopius cochineal insects (Hemiptera: Coccoidea: Dactylopiidae)

Next-generation approaches to advancing eco-immunogenomic research in critically endangered primates.

Hybrid error correction and de novo assembly of single-molecule sequencing reads.

Next generation multilocus sequence typing (NGMLST) and the analytical software program MLSTEZ enable efficient, cost-effective, high-throughput, multilocus sequencing typing.

High-resolution expression map of the Arabidopsis root reveals alternative splicing and lincRNA regulation.

Meeting report: processing, translation, decay – three ways to keep RNA sizzling.

Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire.

Draft genomes of two sordariomycete fungi that produce novel secondary metabolites.

The Florida manatee (Trichechus manatus latirostris) immunoglobulin heavy chain suggests the importance of clan III variable segments in repertoire diversity.

Microbiome and infectivity studies reveal complex polyspecies tree disease in Acute Oak Decline.

Blood CXCR3+CD4 T cells are enriched in inducible replication competent HIV in aviremic antiretroviral therapy-treated individuals.

Complete genome sequence of N2-fixing model strain Klebsiella sp. nov. M5al, which produces plant cell wall-degrading enzymes and siderophores.

Rapid allopolyploid radiation of moonwort ferns (Botrychium; Ophioglossaceae) revealed by PacBio sequencing of homologous and homeologous nuclear regions.

Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert