Parallel sequencing of a single cell’s genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ~3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.
In mammals, GPIHBP1 is absolutely essential for transporting lipoprotein lipase (LPL) to the lumen of capillaries, where it hydrolyzes the triglycerides in triglyceride-rich lipoproteins. In all lower vertebrate species (e.g., birds, amphibians, reptiles, fish), a gene for LPL can be found easily, but a gene for GPIHBP1 has never been found. The obvious question is whether the LPL in lower vertebrates is able to reach the capillary lumen. Using purified antibodies against chicken LPL, we showed that LPL is present on capillary endothelial cells of chicken heart and adipose tissue, colocalizing with von Willebrand factor. When the antibodies against chicken LPL were injected intravenously into chickens, they bound to LPL on the luminal surface of capillaries in heart and adipose tissue. LPL was released rapidly from chicken hearts with an infusion of heparin, consistent with LPL being located inside blood vessels. Remarkably, chicken LPL bound in a specific fashion to mammalian GPIHBP1. However, we could not identify a gene for GPIHBP1 in the chicken genome, nor could we identify a transcript for GPIHBP1 in a large chicken RNA-seq data set. We conclude that LPL reaches the capillary lumen in chickens – as it does in mammals – despite an apparent absence of GPIHBP1.
Despite the significance of chicken as a model organism, our understanding of the chicken transcriptome is limited compared to human. This issue is common to all non-human vertebrate annotations due to the difficulty in transcript identification from short read RNAseq data. While previous studies have used single molecule long read sequencing for transcript discovery, they did not perform RNA normalization and 5′-cap selection which may have resulted in lower transcriptome coverage and truncated transcript sequences.We sequenced normalised chicken brain and embryo RNA libraries with Pacific Bioscience Iso-Seq. 5′ cap selection was performed on the embryo library to provide methodological comparison. From these Iso-Seq sequencing projects, we have identified 60 k transcripts and 29 k genes within the chicken transcriptome. Of these, more than 20 k are novel lncRNA transcripts with ~3 k classified as sense exonic overlapping lncRNA, which is a class that is underrepresented in many vertebrate annotations. The relative proportion of alternative transcription events revealed striking similarities between the chicken and human transcriptomes while also providing explanations for previously observed genomic differences.Our results indicate that the chicken transcriptome is similar in complexity compared to human, and provide insights into other vertebrate biology. Our methodology demonstrates the potential of Iso-Seq sequencing to rapidly expand our knowledge of transcriptomics.
Meeting report: 31st International Mammalian Genome Conference, Mammalian Genetics and Genomics: From Molecular Mechanisms to Translational Applications.
High on the Heidelberg hills, inside the Advanced Training Centre of the European Molecular Biology Laboratory (EMBL) campus with its unique double-helix staircase, scientists gathered for the EMBL conference “Mammalian Genetics and Genomics: From Molecular Mechanisms to Translational Applications,” organized in cooperation with the International Mammalian Genome Society (IMGS) and the Mouse Molecular Genetics (MMG) group. The conference attracted 205 participants from 30 countries, representing 6 of the 7 continents-all except Antarctica. It was a richly diverse group of geneticists, clinicians, and bioinformaticians, with presentations by established and junior investigators, including many trainees. From the 24th-27th of October 2017, they shared exciting advances in mammalian genetics and genomics research, from the introduction of cutting-edge technologies to descriptions of translational studies involving highly relevant models of human disease.
Single-cell sequencing provides information that is not confounded by genotypic or phenotypic heterogeneity of bulk samples. Sequencing of one molecular type (RNA, methylated DNA or open chromatin) in a single cell, furthermore, provides insights into the cell’s phenotype and links to its genotype. Nevertheless, only by taking measurements of these phenotypes and genotypes from the same single cells can such inferences be made unambiguously. In this review, we survey the first experimental approaches that assay, in parallel, multiple molecular types from the same single cell, before considering the challenges and opportunities afforded by these and future technologies. Copyright © 2016. Published by Elsevier Ltd.
Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken.
The domestic chicken (Gallus gallus) is widely used as a model in developmental biology and is also an important livestock species. We describe a novel approach to data integration to generate an mRNA expression atlas for the chicken spanning major tissue types and developmental stages, using a diverse range of publicly-archived RNA-seq datasets and new data derived from immune cells and tissues.Randomly down-sampling RNA-seq datasets to a common depth and quantifying expression against a reference transcriptome using the mRNA quantitation tool Kallisto ensured that disparate datasets explored comparable transcriptomic space. The network analysis tool Graphia was used to extract clusters of co-expressed genes from the resulting expression atlas, many of which were tissue or cell-type restricted, contained transcription factors that have previously been implicated in their regulation, or were otherwise associated with biological processes, such as the cell cycle. The atlas provides a resource for the functional annotation of genes that currently have only a locus ID. We cross-referenced the RNA-seq atlas to a publicly available embryonic Cap Analysis of Gene Expression (CAGE) dataset to infer the developmental time course of organ systems, and to identify a signature of the expansion of tissue macrophage populations during development.Expression profiles obtained from public RNA-seq datasets – despite being generated by different laboratories using different methodologies – can be made comparable to each other. This meta-analytic approach to RNA-seq can be extended with new datasets from novel tissues, and is applicable to any species.
Genomic comparison of highly virulent, moderately virulent, and avirulent strains from a genetically closely-related MRSA ST239 sub-lineage provides insights into pathogenesis.
The genomic comparison of virulent (TW20), moderately virulent (CMRSA6/CMRSA3), and avirulent (M92) strains from a genetically closely-related MRSA ST239 sub-lineage revealed striking similarities in their genomes and antibiotic resistance profiles, despite differences in virulence and pathogenicity. The main differences were in the spa gene (coding for staphylococcal protein A), lpl genes (coding for lipoprotein-like membrane proteins), cta genes (genes involved in heme synthesis), and the dfrG gene (coding for a trimethoprim-resistant dihydrofolate reductase), as well as variations in the presence or content of some prophages and plasmids, which could explain the virulence differences of these strains. TW20 was positive for all genetic traits tested, compared to CMRSA6, CMRSA3, and M92. The major components differing among these strains included spa and lpl with TW20 carrying both whereas CMRSA6/CMRSA3 carry spa identical to TW20 but have a disrupted lpl. M92 is devoid of both these traits. Considering the role played by these components in innate immunity and virulence, it is predicted that since TW20 has both the components intact and functional, these traits contribute to its pathogenesis. However, CMRSA6/CMRSA3 are missing one of these components, hence their intermediately virulent nature. On the contrary, M92 is completely devoid of both the spa and lpl genes and is avirulent. Mobile genetic elements play a potential role in virulence. TW20 carries three prophages (?Sa6, ?Sa3, and ?SPß-like), a pathogenicity island and two plasmids. CMRSA6, CMRSA3, and M92 contain variations in one or more of these components. The virulence associated genes in these components include staphylokinase, entertoxins, antibiotic/antiseptic/heavy metal resistance and bacterial persistence. Additionally, there are many hypothetical proteins (present with variations among strains) with unknown function in these mobile elements which could be making an important contribution in the virulence of these strains. The above mentioned repertoire of virulence components in TW20 likely contributes to its increased virulence, while the absence and/or modification of one or more of these components in CMRSA6/CMRSA3 and M92 likely affects the virulence of the strains.
Genomes of ubiquitous marine and hypersaline Hydrogenovibrio, Thiomicrorhabdus and Thiomicrospira spp. encode a diversity of mechanisms to sustain chemolithoautotrophy in heterogeneous environments.
Chemolithoautotrophic bacteria from the genera Hydrogenovibrio, Thiomicrorhabdus and Thiomicrospira are common, sometimes dominant, isolates from sulfidic habitats including hydrothermal vents, soda and salt lakes and marine sediments. Their genome sequences confirm their membership in a deeply branching clade of the Gammaproteobacteria. Several adaptations to heterogeneous habitats are apparent. Their genomes include large numbers of genes for sensing and responding to their environment (EAL- and GGDEF-domain proteins and methyl-accepting chemotaxis proteins) despite their small sizes (2.1-3.1 Mbp). An array of sulfur-oxidizing complexes are encoded, likely to facilitate these organisms’ use of multiple forms of reduced sulfur as electron donors. Hydrogenase genes are present in some taxa, including group 1d and 2b hydrogenases in Hydrogenovibrio marinus and H. thermophilus MA2-6, acquired via horizontal gene transfer. In addition to high-affinity cbb3 cytochrome c oxidase, some also encode cytochrome bd-type quinol oxidase or ba3 -type cytochrome c oxidase, which could facilitate growth under different oxygen tensions, or maintain redox balance. Carboxysome operons are present in most, with genes downstream encoding transporters from four evolutionarily distinct families, which may act with the carboxysomes to form CO2 concentrating mechanisms. These adaptations to habitat variability likely contribute to the cosmopolitan distribution of these organisms.© 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.
The cane toad (Rhinella marina formerly Bufo marinus) is a species native to Central and South America that has spread across many regions of the globe. Cane toads are known for their rapid adaptation and deleterious impacts on native fauna in invaded regions. However, despite an iconic status, there are major gaps in our understanding of cane toad genetics. The availability of a genome would help to close these gaps and accelerate cane toad research.We report a draft genome assembly for R. marina, the first of its kind for the Bufonidae family. We used a combination of long-read Pacific Biosciences RS II and short-read Illumina HiSeq X sequencing to generate 359.5 Gb of raw sequence data. The final hybrid assembly of 31,392 scaffolds was 2.55 Gb in length with a scaffold N50 of 168 kb. BUSCO analysis revealed that the assembly included full length or partial fragments of 90.6% of tetrapod universal single-copy orthologs (n = 3950), illustrating that the gene-containing regions have been well assembled. Annotation predicted 25,846 protein coding genes with similarity to known proteins in Swiss-Prot. Repeat sequences were estimated to account for 63.9% of the assembly.The R. marina draft genome assembly will be an invaluable resource that can be used to further probe the biology of this invasive species. Future analysis of the genome will provide insights into cane toad evolution and enrich our understanding of their interplay with the ecosystem at large.
A model for the evolution of prokaryotic DNA restriction-modification systems based upon the structural malleability of Type I restriction-modification enzymes.
Restriction Modification (RM) systems prevent the invasion of foreign genetic material into bacterial cells by restriction and protect the host’s genetic material by methylation. They are therefore important in maintaining the integrity of the host genome. RM systems are currently classified into four types (I to IV) on the basis of differences in composition, target recognition, cofactors and the manner in which they cleave DNA. Comparing the structures of the different types, similarities can be observed suggesting an evolutionary link between these different types. This work describes the ‘deconstruction’ of a large Type I RM enzyme into forms structurally similar to smaller Type II RM enzymes in an effort to elucidate the pathway taken by Nature to form these different RM enzymes. Based upon the ability to engineer new enzymes from the Type I ‘scaffold’, an evolutionary pathway and the evolutionary pressures required to move along the pathway from Type I RM systems to Type II RM systems are proposed. Experiments to test the evolutionary model are discussed.
Evolutionary history of human Plasmodium vivax revealed by genome-wide analyses of related ape parasites.
Wild-living African apes are endemically infected with parasites that are closely related to human Plasmodium vivax, a leading cause of malaria outside Africa. This finding suggests that the origin of P. vivax was in Africa, even though the parasite is now rare in humans there. To elucidate the emergence of human P. vivax and its relationship to the ape parasites, we analyzed genome sequence data of P. vivax strains infecting six chimpanzees and one gorilla from Cameroon, Gabon, and Côte d’Ivoire. We found that ape and human parasites share nearly identical core genomes, differing by only 2% of coding sequences. However, compared with the ape parasites, human strains of P. vivax exhibit about 10-fold less diversity and have a relative excess of nonsynonymous nucleotide polymorphisms, with site-frequency spectra suggesting they are subject to greatly relaxed purifying selection. These data suggest that human P. vivax has undergone an extreme bottleneck, followed by rapid population expansion. Investigating potential host-specificity determinants, we found that ape P. vivax parasites encode intact orthologs of three reticulocyte-binding protein genes (rbp2d, rbp2e, and rbp3), which are pseudogenes in all human P. vivax strains. However, binding studies of recombinant RBP2e and RBP3 proteins to human, chimpanzee, and gorilla erythrocytes revealed no evidence of host-specific barriers to red blood cell invasion. These data suggest that, from an ancient stock of P. vivax parasites capable of infecting both humans and apes, a severely bottlenecked lineage emerged out of Africa and underwent rapid population growth as it spread globally. Copyright © 2018 the Author(s). Published by PNAS.
Many animal species comprise discrete phenotypic forms. A common example in natural populations of insects is the occurrence of different color patterns, which has motivated a rich body of ecological and genetic research [1-6]. The occurrence of dark, i.e., melanic, forms displaying discrete color patterns is found across multiple taxa, but the underlying genomic basis remains poorly characterized. In numerous ladybird species (Coccinellidae), the spatial arrangement of black and red patches on adult elytra varies wildly within species, forming strikingly different complex color patterns [7, 8]. In the harlequin ladybird, Harmonia axyridis, more than 200 distinct color forms have been described, which classic genetic studies suggest result from allelic variation at a single, unknown, locus [9, 10]. Here, we combined whole-genome sequencing, population-based genome-wide association studies, gene expression, and functional analyses to establish that the transcription factor Pannier controls melanic pattern polymorphism in H. axyridis. We show that pannier is necessary for the formation of melanic elements on the elytra. Allelic variation in pannier leads to protein expression in distinct domains on the elytra and thus determines the distinct color patterns in H. axyridis. Recombination between pannier alleles may be reduced by a highly divergent sequence of ~170 kb in the cis-regulatory regions of pannier, with a 50 kb inversion between color forms. This most likely helps maintain the distinct alleles found in natural populations. Thus, we propose that highly variable discrete color forms can arise in natural populations through cis-regulatory allelic variation of a single gene. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Comparative genomics of Staphylococcus reveals determinants of speciation and diversification of antimicrobial defense.
The bacterial genus Staphylococcus comprises diverse species with most being described as colonizers of human and animal skin. A relational analysis of features that discriminate its species and contribute to niche adaptation and survival remains to be fully described. In this study, an interspecies, whole-genome comparative analysis of 21 Staphylococcus species was performed based on their orthologues. Three well-defined multi-species groups were identified: group A (including aureus/epidermidis); group B (including saprophyticus/xylosus) and group C (including pseudintermedius/delphini). The machine learning algorithm Random Forest was applied to prioritize orthologs that drive formation of the Staphylococcus species groups A-C. Orthologues driving staphylococcal intrageneric diversity comprised regulatory, metabolic and antimicrobial resistance proteins. Notably, the BraSR (NsaRS) two-component system (TCS) and its associated BraDE transporters that regulate antimicrobial resistance showed limited distribution in the genus and their presence was most closely associated with a subset of Staphylococcus species dominated by those that colonize human skin. Divergence of BraSR and GraSR antimicrobial peptide survival TCS and their associated transporters was observed across the staphylococci, likely reflecting niche specific evolution of these TCS/transporters and their specificities for AMPs. Experimental evolution, with selection for resistance to the lantibiotic nisin, revealed multiple routes to resistance and differences in the selection outcomes of the BraSR-positive species S. hominis and S. aureus. Selection supported a role for GraSR in nisin survival responses of the BraSR-negative species S. saprophyticus. Our study reveals diversification of antimicrobial-sensing TCS across the staphylococci and hints at differential relationships between GraSR and BraSR in those species positive for both TCS.
Genomic surveillance of Enterococcus faecium reveals limited sharing of strains and resistance genes between livestock and humans in the United Kingdom.
Vancomycin-resistant Enterococcus faecium (VREfm) is a major cause of nosocomial infection and is categorized as high priority by the World Health Organization global priority list of antibiotic-resistant bacteria. In the past, livestock have been proposed as a putative reservoir for drug-resistant E. faecium strains that infect humans, and isolates of the same lineage have been found in both reservoirs. We undertook cross-sectional surveys to isolate E. faecium (including VREfm) from livestock farms, retail meat, and wastewater treatment plants in the United Kingdom. More than 600 isolates from these sources were sequenced, and their relatedness and antibiotic resistance genes were compared with genomes of almost 800 E. faecium isolates from patients with bloodstream infection in the United Kingdom and Ireland. E. faecium was isolated from 28/29 farms; none of these isolates were VREfm, suggesting a decrease in VREfm prevalence since the last UK livestock survey in 2003. However, VREfm was isolated from 1% to 2% of retail meat products and was ubiquitous in wastewater treatment plants. Phylogenetic comparison demonstrated that the majority of human and livestock-related isolates were genetically distinct, although pig isolates from three farms were more genetically related to human isolates from 2001 to 2004 (minimum of 50?single-nucleotide polymorphisms [SNPs]). Analysis of accessory (variable) genes added further evidence for distinct niche adaptation. An analysis of acquired antibiotic resistance genes and their variants revealed limited sharing between humans and livestock. Our findings indicate that the majority of E. faecium strains infecting patients are largely distinct from those from livestock in this setting, with limited sharing of strains and resistance genes.IMPORTANCE The rise in rates of human infection caused by vancomycin-resistant Enterococcus faecium (VREfm) strains between 1988 to the 2000s in Europe was suggested to be associated with acquisition from livestock. As a result, the European Union banned the use of the glycopeptide drug avoparcin as a growth promoter in livestock feed. While some studies reported a decrease in VREfm in livestock, others reported no reduction. Here, we report the first livestock VREfm prevalence survey in the UK since 2003 and the first large-scale study using whole-genome sequencing to investigate the relationship between E. faecium strains in livestock and humans. We found a low prevalence of VREfm in retail meat and limited evidence for recent sharing of strains between livestock and humans with bloodstream infection. There was evidence for limited sharing of genes encoding antibiotic resistance between these reservoirs, a finding which requires further research. Copyright © 2018 Gouliouris et al.
Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing.
Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing.We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points.We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data.