Gut microbiota is a determining factor in human physiological functions and health. It is commonly accepted that diet has a major influence on the gut microbial community, however, the effects of diet is not fully understood. The typical Mongolian diet is characterized by high and frequent consumption of fermented dairy products and red meat, and low level of carbohydrates. In this study, the gut microbiota profile of 26 Mongolians whom consumed wheat, rice and oat as the sole carbohydrate staple food for a week each consecutively was determined. It was observed that changes in staple carbohydrate rapidly (within a week) altered gut microbial community structure and metabolic pathway of the subjects. Wheat and oat favored bifidobacteria (Bifidobacterium catenulatum, Bifodobacteriumbifidum, Bifidobacterium adolescentis); whereas rice suppressed bifidobacteria (Bifidobacterium longum, Bifidobacterium adolescentis) and wheat suppresses Lactobaciilus, Ruminococcus and Bacteroides. The study exhibited two gut microbial clustering patterns with the preference of fucosyllactose utilization linking to fucosidase genes (glycoside hydrolase family classifications: GH95 and GH29) encoded by Bifidobacterium, and xylan and arabinoxylan utilization linking to xylanase and arabinoxylanase genes encoded by Bacteroides. There was also a correlation between Lactobacillus ruminis and sialidase, as well as Butyrivibrio crossotus and xylanase/xylosidase. Meanwhile, a strong concordance was found between the gastrointestinal bacterial microbiome and the intestinal virome. Present research will contribute to understanding the impacts of the dietary carbohydrate on human gut microbiome, which will ultimately help understand relationships between dietary factor, microbial populations, and the health of global humans.
Comprehensive genomic analysis of malignant pleural mesothelioma identifies recurrent mutations, gene fusions and splicing alterations.
We analyzed transcriptomes (n = 211), whole exomes (n = 99) and targeted exomes (n = 103) from 216 malignant pleural mesothelioma (MPM) tumors. Using RNA-seq data, we identified four distinct molecular subtypes: sarcomatoid, epithelioid, biphasic-epithelioid (biphasic-E) and biphasic-sarcomatoid (biphasic-S). Through exome analysis, we found BAP1, NF2, TP53, SETD2, DDX3X, ULK2, RYR2, CFAP45, SETDB1 and DDX51 to be significantly mutated (q-score = 0.8) in MPMs. We identified recurrent mutations in several genes, including SF3B1 (~2%; 4/216) and TRAF7 (~2%; 5/216). SF3B1-mutant samples showed a splicing profile distinct from that of wild-type tumors. TRAF7 alterations occurred primarily in the WD40 domain and were, except in one case, mutually exclusive with NF2 alterations. We found recurrent gene fusions and splice alterations to be frequent mechanisms for inactivation of NF2, BAP1 and SETD2. Through integrated analyses, we identified alterations in Hippo, mTOR, histone methylation, RNA helicase and p53 signaling pathways in MPMs.
Alternative RNA splicing is a known phenomenon, but we still do not have a complete catalog of isoforms that explain variability in the human transcriptome. We have made significant progress in developing methods to study variability of the transcriptome, but we are far away of having a complete picture of the transcriptome. The initial methods to study gene expression were based on cloning of cDNAs and Sanger sequencing. The strategy was labor-intensive and expensive. With the development of microarrays, different methods based on exon arrays and tiling arrays provided valuable information about RNA expression. However, the microarray presented significant limitations. Most of the limitations became apparent by 2005, but it was not until 2008 that an alternative method to study the transcriptome was developed. RNA Sequencing using next-generation sequencing (RNA-Seq) quickly became the technology of choice for gene expression profiling. Recently, the precision and sensitivity of RNA-Seq have come into question, especially for transcriptome reconstruction. This chapter will describe a relatively new method, “Isoform Sequencing (Iso-Seq). Iso-Seq was developed by Pacific Biosciences (PacBio), and it is capable of identifying new isoforms with extraordinary precision due to its long-read technology. The technique to create libraries is straightforward, and the PacBio RS II instrument generates the information in hours. The bioinformatics analysis is performed using the freely available SMRT® Portal software. The SMRT Portal is easy to use and capable of performing all the steps necessary to analyze the raw data and to generate high-quality full-length isoforms. For the universal acceptance of the Iso-Seq method, the capacity of the SMRT Cells needs to improve at least 10- to 100-fold to make the system affordable and attractive to users.
Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line.
The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.© 2018 Nattestad et al.; Published by Cold Spring Harbor Laboratory Press.
Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93?Gb (contig N50: 8.3?Mb, scaffold N50: 22.0?Mb, including 39.3?Mb N-bases), together with 206?Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8?Mb of HX1-specific sequences, including 4.1?Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.
Improved full-length killer cell immunoglobulin-like receptor transcript discovery in Mauritian cynomolgus macaques.
Killer cell immunoglobulin-like receptors (KIRs) modulate disease progression of pathogens including HIV, malaria, and hepatitis C. Cynomolgus and rhesus macaques are widely used as nonhuman primate models to study human pathogens, and so, considerable effort has been put into characterizing their KIR genetics. However, previous studies have relied on cDNA cloning and Sanger sequencing that lack the throughput of current sequencing platforms. In this study, we present a high throughput, full-length allele discovery method utilizing Pacific Biosciences circular consensus sequencing (CCS). We also describe a new approach to Macaque Exome Sequencing (MES) and the development of the Rhexome1.0, an adapted target capture reagent that includes macaque-specific capture probe sets. By using sequence reads generated by whole genome sequencing (WGS) and MES to inform primer design, we were able to increase the sensitivity of KIR allele discovery. We demonstrate this increased sensitivity by defining nine novel alleles within a cohort of Mauritian cynomolgus macaques (MCM), a geographically isolated population with restricted KIR genetics that was thought to be completely characterized. Finally, we describe an approach to genotyping KIRs directly from sequence reads generated using WGS/MES reads. The findings presented here expand our understanding of KIR genetics in MCM by associating new genes with all eight KIR haplotypes and demonstrating the existence of at least one KIR3DS gene associated with every haplotype.
In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3-4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS.
Thanks to a recent spate of sequencing projects, the Hemiptera are the first hemimetabolous insect order to achieve a critical mass of species with sequenced genomes, establishing the basis for comparative genomics of the bugs. However, as the most speciose hemimetabolous order, there is still a vast swathe of the hemipteran phylogeny that awaits genomic representation across subterranean, terrestrial, and aquatic habitats, and with lineage-specific and developmentally plastic cases of both wing polyphenisms and flightlessness. In this review, we highlight opportunities for taxonomic sampling beyond obvious pest species candidates, motivated by intriguing biological features of certain groups as well as the rich research tradition of ecological, physiological, developmental, and particularly cytogenetic investigation that spans the diversity of the Hemiptera.
Complete genome sequence of N2-fixing model strain Klebsiella sp. nov. M5al, which produces plant cell wall-degrading enzymes and siderophores.
The bacterial strain M5al is a model strain for studying the molecular genetics of N2-fixation and molecular engineering of microbial production of platform chemicals 1,3-propanediol and 2,3-butanediol. Here, we present the complete genome sequence of the strain M5al, which belongs to a novel species closely related toKlebsiella michiganensis. M5al secretes plant cell wall-degrading enzymes and colonizes rice roots but does not cause soft rot disease. M5al also produces siderophores and contains the gene clusters for synthesis and transport of yersiniabactin which is a critical virulence factor forKlebsiellapathogens in causing human disease. We propose that the model strain M5al can be genetically modified to study bacterial N2-fixation in association with non-legume plants and production of 1,3-propanediol and 2,3-butanediol through degradation of plant cell wall biomass.
Trypanosoma cruzi, a zoonotic kinetoplastid protozoan with a complex genome, is the causative agent of American trypanosomiasis (Chagas disease). The parasite uses a highly diverse repertoire of surface molecules, with roles in cell invasion, immune evasion and pathogenesis. Thus far, the genomic regions containing these genes have been impossible to resolve and it has been impossible to study the structure and function of the several thousand repetitive genes encoding the surface molecules of the parasite. We here present an improved genome assembly of a T. cruzi clade I (TcI) strain using high coverage PacBio single molecule sequencing, together with Illumina sequencing of 34 T. cruzi TcI isolates and clones from different geographic locations, sample sources and clinical outcomes. Resolution of the surface molecule gene structure reveals an unusual duality in the organisation of the parasite genome, a core genomic region syntenous with related protozoa flanked by unique and highly plastic subtelomeric regions encoding surface antigens. The presence of abundant interspersed retrotransposons in the subtelomeres suggests that these elements are involved in a recombination mechanism for the generation of antigenic variation and evasion of the host immune response. The comparative genomic analysis of the cohort of TcI strains revealed multiple cases of such recombination events involving surface molecule genes and has provided new insights into T. cruzi population structure.
The accumulation of sequenced Francisella strains has made it increasingly apparent that the 16S rRNA gene alone is not enough to stratify the Francisella genus into precise and clinically useful classifications. Continued whole-genome sequencing of isolates will provide a larger base of knowledge for targeted approaches with broad applicability. Additionally, examination of genomic information on a case-by-case basis will help resolve outstanding questions regarding strain stratification. We report the complete genome sequence of a clinical isolate, designated here as F. novicida-like strain TCH2015, acquired from the lymph node of a 6-year-old male. Two features were atypical for F. novicida: exhibition of functional oxidase activity and additional gene content, including proposed virulence determinants. These differences, which could potentially impact virulence and clinical diagnosis, emphasize the need for more comprehensive methods to profile Francisella isolates. This study highlights the value of whole-genome sequencing, which will lead to a more robust database of environmental and clinical genomes and inform strategies to improve detection and classification of Francisella strains. Copyright © 2017 Elsevier Inc. All rights reserved.
Bats harbor many viruses asymptomatically, including several notorious for causing extreme virulence in humans. To identify differences between antiviral mechanisms in humans and bats, we sequenced, assembled, and analyzed the genome of Rousettus aegyptiacus, a natural reservoir of Marburg virus and the only known reservoir for any filovirus. We found an expanded and diversified KLRC/KLRD family of natural killer cell receptors, MHC class I genes, and type I interferons, which dramatically differ from their functional counterparts in other mammals. Such concerted evolution of key components of bat immunity is strongly suggestive of novel modes of antiviral defense. An evaluation of the theoretical function of these genes suggests that an inhibitory immune state may exist in bats. Based on our findings, we hypothesize that tolerance of viral infection, rather than enhanced potency of antiviral defenses, may be a key mechanism by which bats asymptomatically host viruses that are pathogenic in humans. Copyright © 2018 Elsevier Inc. All rights reserved.
Lactic acid bacteria (LAB) are one of the microorganisms of choice for the development of protein delivery systems for therapeutic purposes. Although there are numerous tools to facilitate genome engineering of lactobacilli; transformation efficiency still limits the ability to engineer their genomes. While genetically manipulating Lactobacillus reuteri ATCC PTA 6475 (LR 6475), we noticed that after an initial transformation, several LR 6475 strains significantly improved their ability to take up plasmid DNA via electroporation. Our goal was to understand the molecular basis for how these strains acquired the ability to increase transformation efficiency.Strains generated after transformation of plasmids pJP067 and pJP042 increased their ability to transform plasmid DNA about one million fold for pJP067, 100-fold for pSIP411 and tenfold for pNZ8048. Upon sequencing of the whole genome from these strains, we identified several genomic mutations and rearrangements, with all strains containing mutations in the transformation related gene A (trgA). To evaluate the role of trgA in transformation of DNA, we generated a trgA null that improved the transformation efficiency of LR 6475 to transform pSIP411 and pJP067 by at least 100-fold, demonstrating that trgA significantly impairs the ability of LR 6475 to take-up plasmid DNA. We also identified genomic rearrangements located in and around two prophages inserted in the LR 6475 genome that included deletions, insertions and an inversion of 336 Kb. A second group of rearrangements was observed in a Type I restriction modification system, in which the specificity subunits underwent several rearrangements in the target recognition domain. Despite the magnitude of these rearrangements in the prophage genomes and restriction modification systems, none of these genomic changes impacted transformation efficiency to the level induced by trgA.Our findings demonstrate how genetic manipulation of LR 6475 with plasmid DNA leads to genomic changes that improve their ability to transform plasmid DNA; highlighting trgA as the primary driver of this phenotype. Additionally, this study also underlines the importance of characterizing genetic changes that take place after genome engineering of strains for therapeutic purposes.
Here we present Parliament2: a structural variant caller which combines multiple best-in-class structural variant callers to create a highly accurate callset. This captures more events than the individual callers achieve independently. Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 applies an additional parallelization framework to speed certain callers and executes these in parallel, taking advantage of the different resource requirements to complete structural variant calling much faster than running the programs individually. Parliament2 is available as a Docker container, which pre-installs all required dependencies. This allows users to run any caller with easy installation and execution. This Docker container can easily be deployed in cloud or local environments and is available as an app on DNAnexus.
Recurrent loss of HMGCS2 shows that ketogenesis is not essential for the evolution of large mammalian brains.
Apart from glucose, fatty acid-derived ketone bodies provide metabolic energy for the brain during fasting and neonatal development. We investigated the evolution of HMGCS2, the key enzyme required for ketone body biosynthesis (ketogenesis). Unexpectedly, we found that three mammalian lineages, comprising cetaceans (dolphins and whales), elephants and mastodons, and Old World fruit bats have lost this gene. Remarkably, many of these species have exceptionally large brains and signs of intelligent behavior. While fruit bats are sensitive to starvation, cetaceans and elephants can still withstand periods of fasting. This suggests that alternative strategies to fuel large brains during fasting evolved repeatedly and reveals flexibility in mammalian energy metabolism. Furthermore, we show that HMGCS2 loss preceded brain size expansion in toothed whales and elephants. Thus, while ketogenesis was likely important for brain size expansion in modern humans, ketogenesis is not a universal precondition for the evolution of large mammalian brains.© 2018, Jebb et al.