The human disease lymphatic filariasis causes the debilitating effects of elephantiasis and hydrocele. Lymphatic filariasis currently affects the lives of 90 million people in 52 countries. There are three nematodes that cause lymphatic filariasis, Brugia malayi, Brugia timori, and Wuchereria bancrofti, but 90% of all cases of lymphatic filariasis are caused solely by W. bancrofti (Wb). Here we use population genomics to reconstruct the probable route and timing of migration of Wb strains that currently infect Africa, Haiti, and Papua New Guinea (PNG). We used selective whole genome amplification to sequence 42 whole genomes of single Wb worms from populations in Haiti, Mali, Kenya, and PNG. Our results are consistent with a hypothesis of an Island Southeast Asia or East Asian origin of Wb. Our demographic models support divergence times that correlate with the migration of human populations. We hypothesize that PNG was infected at two separate times, first by the Melanesians and later by the migrating Austronesians. The migrating Austronesians also likely introduced Wb to Madagascar where later migrations spread it to continental Africa. From Africa, Wb spread to the New World during the transatlantic slave trade. Genome scans identified 17 genes that were highly differentiated among Wb populations. Among these are genes associated with human immune suppression, insecticide sensitivity, and proposed drug targets. Identifying the distribution of genetic diversity in Wb populations and selection forces acting on the genome will build a foundation to test future hypotheses and help predict response to current eradication efforts. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: email@example.com.
Genome of the Komodo dragon reveals adaptations in the cardiovascular and chemosensory systems of monitor lizards.
Monitor lizards are unique among ectothermic reptiles in that they have high aerobic capacity and distinctive cardiovascular physiology resembling that of endothermic mammals. Here, we sequence the genome of the Komodo dragon Varanus komodoensis, the largest extant monitor lizard, and generate a high-resolution de novo chromosome-assigned genome assembly for V. komodoensis using a hybrid approach of long-range sequencing and single-molecule optical mapping. Comparing the genome of V. komodoensis with those of related species, we find evidence of positive selection in pathways related to energy metabolism, cardiovascular homoeostasis, and haemostasis. We also show species-specific expansions of a chemoreceptor gene family related to pheromone and kairomone sensing in V. komodoensis and other lizard lineages. Together, these evolutionary signatures of adaptation reveal the genetic underpinnings of the unique Komodo dragon sensory and cardiovascular systems, and suggest that selective pressure altered haemostasis genes to help Komodo dragons evade the anticoagulant effects of their own saliva. The Komodo dragon genome is an important resource for understanding the biology of monitor lizards and reptiles worldwide.
The robust detection of structural variants in mammalian genomes remains a challenge. It is particularly difficult in the case of genetically unstable Chinese hamster ovary (CHO) cell lines with only draft genome assemblies available. We explore the potential of the CRISPR/Cas9 system for the targeted capture of genomic loci containing integrated vectors in CHO-K1-based cell lines followed by next generation sequencing (NGS), and compare it to popular target-enrichment sequencing methods and to whole genome sequencing (WGS). Three different CRISPR/Cas9-based techniques were evaluated; all of them allow for amplification-free enrichment of target genomic regions in the range from 5 to 60 fold, and for recovery of ~15 kb-long sequences with no sequencing artifacts introduced. The utility of these protocols has been proven by the identification of transgene integration sites and flanking sequences in three CHO cell lines. The long enriched fragments helped to identify Escherichia coli genome sequences co-integrated with vectors, and were further characterized by Whole Genome Sequencing (WGS). Other advantages of CRISPR/Cas9-based methods are the ease of bioinformatics analysis, potential for multiplexing, and the production of long target templates for real-time sequencing.
The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.
Vertebrate genomes contain a record of retroviruses that invaded the germlines of ancestral hosts and are passed to offspring as endogenous retroviruses (ERVs). ERVs can impact host function since they contain the necessary sequences for expression within the host. Dogs are an important system for the study of disease and evolution, yet no substantiated reports of infectious retroviruses in dogs exist. Here, we utilized Illumina whole genome sequence data to assess the origin and evolution of a recently active gammaretroviral lineage in domestic and wild canids.We identified numerous recently integrated loci of a canid-specific ERV-Fc sublineage within Canis, including 58 insertions that were absent from the reference assembly. Insertions were found throughout the dog genome including within and near gene models. By comparison of orthologous occupied sites, we characterized element prevalence across 332 genomes including all nine extant canid species, revealing evolutionary patterns of ERV-Fc segregation among species as well as subpopulations.Sequence analysis revealed common disruptive mutations, suggesting a predominant form of ERV-Fc spread by trans complementation of defective proviruses. ERV-Fc activity included multiple circulating variants that infected canid ancestors from the last 20 million to within 1.6 million years, with recent bursts of germline invasion in the sublineage leading to wolves and dogs.
Streamlined ex vivo and in vivo genome editing in mouse embryos using recombinant adeno-associated viruses.
Recent advances using CRISPR-Cas9 approaches have dramatically enhanced the ease for genetic manipulation in rodents. Notwithstanding, the methods to deliver nucleic acids into pre-implantation embryos have hardly changed since the original description of mouse transgenesis more than 30 years ago. Here we report a novel strategy to generate genetically modified mice by transduction of CRISPR-Cas9 components into pre-implantation mouse embryos via recombinant adeno-associated viruses (rAAVs). Using this approach, we efficiently generated a variety of targeted mutations in explanted embryos, including indel events produced by non-homologous end joining and tailored mutations using homology-directed repair. We also achieved gene modification in vivo by direct delivery of rAAV particles into the oviduct of pregnant females. Our approach greatly simplifies the generation of genetically modified mice and, more importantly, opens the door for streamlined gene editing in other mammalian species.
Cultivated bacteria such as actinomycetes are a highly useful source of biomedically important natural products. However, such ‘talented’ producers represent only a minute fraction of the entire, mostly uncultivated, prokaryotic diversity. The uncultured majority is generally perceived as a large, untapped resource of new drug candidates, but so far it is unknown whether taxa containing talented bacteria indeed exist. Here we report the single-cell- and metagenomics-based discovery of such producers. Two phylotypes of the candidate genus ‘Entotheonella’ with genomes of greater than 9 megabases and multiple, distinct biosynthetic gene clusters co-inhabit the chemically and microbially rich marine sponge Theonella swinhoei. Almost all bioactive polyketides and peptides known from this animal were attributed to a single phylotype. ‘Entotheonella’ spp. are widely distributed in sponges and belong to an environmental taxon proposed here as candidate phylum ‘Tectomicrobia’. The pronounced bioactivities and chemical uniqueness of ‘Entotheonella’ compounds provide significant opportunities for ecological studies and drug discovery.
Most current approaches to analyse metagenomic data rely on reference genomes. Novel microbial communities extend far beyond the coverage of reference databases and de novo metagenome assembly from complex microbial communities remains a great challenge. Here we present a novel experimental and bioinformatic framework, metaSort, for effective construction of bacterial genomes from metagenomic samples. MetaSort provides a sorted mini-metagenome approach based on flow cytometry and single-cell sequencing methodologies, and employs new computational algorithms to efficiently recover high-quality genomes from the sorted mini-metagenome by the complementary of the original metagenome. Through extensive evaluations, we demonstrated that metaSort has an excellent and unbiased performance on genome recovery and assembly. Furthermore, we applied metaSort to an unexplored microflora colonized on the surface of marine kelp and successfully recovered 75 high-quality genomes at one time. This approach will greatly improve access to microbial genomes from complex or novel communities.
Parallel sequencing of a single cell’s genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ~3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.
In this study, we sequenced the first full-length insect transcriptome using the Erthesina fullo Thunberg based on the PacBio platform. We constructed the first quantitative transcription map of animal mitochondrial genomes and built a straightforward and concise methodology to investigate mitochondrial gene transcription, RNA processing, mRNA maturation and several other related topics. Most of the results were consistent with the previous studies, while to the best of our knowledge some findings were reported for the first time in this study. The new findings included the high levels of mitochondrial gene expression, the 3′ polyadenylation and possible 5′ m(7)G caps of rRNAs, the isoform diversity of 12S rRNA, the polycistronic transcripts and natural antisense transcripts of mitochondrial genes et al. These findings could challenge and enrich fundamental concepts of mitochondrial gene transcription and RNA processing, particularly of the rRNA primary (sequence) structure. The methodology constructed in this study can also be used to study gene expression or RNA processing of nuclear genomes.
The simultaneous sequencing of a single cell’s genome and transcriptome offers a powerful means to dissect genetic variation and its effect on gene expression. Here we describe G&T-seq, a method for separating and sequencing genomic DNA and full-length mRNA from single cells. By applying G&T-seq to over 220 single cells from mice and humans, we discovered cellular properties that could not be inferred from DNA or RNA sequencing alone.
Single cell genomic study of Dehalococcoidetes species from deep-sea sediments of the Peruvian Margin.
The phylum Chloroflexi is one of the most frequently detected phyla in the subseafloor of the Pacific Ocean margins. Dehalogenating Chloroflexi (Dehalococcoidetes) was originally discovered as the key microorganisms mediating reductive dehalogenation via their key enzymes reductive dehalogenases (Rdh) as sole mode of energy conservation in terrestrial environments. The frequent detection of Dehalococcoidetes-related 16S rRNA and rdh genes in the marine subsurface implies a role for dissimilatory dehalorespiration in this environment; however, the two genes have never been linked to each other. To provide fundamental insights into the metabolism, genomic population structure and evolution of marine subsurface Dehalococcoidetes sp., we analyzed a non-contaminated deep-sea sediment core sample from the Peruvian Margin Ocean Drilling Program (ODP) site 1230, collected 7.3?m below the seafloor by a single cell genomic approach. We present for the first time single cell genomic data on three deep-sea Chloroflexi (Dsc) single cells from a marine subsurface environment. Two of the single cells were considered to be part of a local Dehalococcoidetes population and assembled together into a 1.38-Mb genome, which appears to be at least 85% complete. Despite a high degree of sequence-level similarity between the shared proteins in the Dsc and terrestrial Dehalococcoidetes, no evidence for catabolic reductive dehalogenation was found in Dsc. The genome content is however consistent with a strictly anaerobic organotrophic or lithotrophic lifestyle.
Genomic and metabolic diversity of Marine Group I Thaumarchaeota in the mesopelagic of two subtropical gyres.
Marine Group I (MGI) Thaumarchaeota are one of the most abundant and cosmopolitan chemoautotrophs within the global dark ocean. To date, no representatives of this archaeal group retrieved from the dark ocean have been successfully cultured. We used single cell genomics to investigate the genomic and metabolic diversity of thaumarchaea within the mesopelagic of the subtropical North Pacific and South Atlantic Ocean. Phylogenetic and metagenomic recruitment analysis revealed that MGI single amplified genomes (SAGs) are genetically and biogeographically distinct from existing thaumarchaea cultures obtained from surface waters. Confirming prior studies, we found genes encoding proteins for aerobic ammonia oxidation and the hydrolysis of urea, which may be used for energy production, as well as genes involved in 3-hydroxypropionate/4-hydroxybutyrate and oxidative tricarboxylic acid pathways. A large proportion of protein sequences identified in MGI SAGs were absent in the marine cultures Cenarchaeum symbiosum and Nitrosopumilus maritimus, thus expanding the predicted protein space for this archaeal group. Identifiable genes located on genomic islands with low metagenome recruitment capacity were enriched in cellular defense functions, likely in response to viral infections or grazing. We show that MGI Thaumarchaeota in the dark ocean may have more flexibility in potential energy sources and adaptations to biotic interactions than the existing, surface-ocean cultures.
Cells are a fundamental unit of life, and the ability to study the phenotypes and behaviors of individual cells is crucial to understanding the workings of complex biological systems. Cell phenotypes (epigenomic, transcriptomic, proteomic, and metabolomic) exhibit dramatic heterogeneity between and within the different cell types and states underlying cellular functional diversity. Cell genotypes can also display heterogeneity throughout an organism, in the form of somatic genetic variation-most notably in the emergence and evolution of tumors. Recent technical advances in single-cell isolation and the development of omics approaches sensitive enough to reveal these aspects of cell identity have enabled a revolution in the study of multicellular systems. In this review, we discuss the technologies available to resolve the genomes, epigenomes, transcriptomes, proteomes, and metabolomes of single cells from a wide variety of living systems.© 2018 The Authors. Proteomics Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome.
Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested.Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes.MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.