Complete genome sequence of Enterococcus durans Oregon-R-modENCODE strain BDGP3, a lactic acid bacterium found in the Drosophila melanogaster gut

Enterococcus durans Oregon-R-modENCODE strain BDGP3 was isolated from the Drosophila melanogaster gut for functional host-microbe interaction studies. The complete genome is composed of a single circular genome of 2,983,334 bp, with a G+C content of 38%, and a single plasmid of 5,594 bp. Copyright © 2017 Wan et al.

Quantitative profiling of Drosophila melanogaster Dscam1 isoforms reveals no changes in splicing after bacterial exposure.

The hypervariable Dscam1 (Down syndrome cell adhesion molecule 1) gene can produce thousands of different ectodomain isoforms via mutually exclusive alternative splicing. Dscam1 appears to be involved in the immune response of some insects and crustaceans. It has been proposed that the diverse isoforms may be involved in the recognition of, or the defence against, diverse parasite epitopes, although evidence to support this is sparse. A prediction that can be generated from this hypothesis is that the gene expression of specific exons and/or isoforms is influenced by exposure to an immune elicitor. To test this hypothesis, we for the first time, use a long read RNA sequencing method to directly investigate the Dscam1 splicing pattern after exposing adult Drosophila melanogaster and a S2 cell line to live Escherichia coli. After bacterial exposure both models showed increased expression of immune-related genes, indicating that the immune system had been activated. However there were no changes in total Dscam1 mRNA expression. RNA sequencing further showed that there were no significant changes in individual exon expression and no changes in isoform splicing patterns in response to bacterial exposure. Therefore our studies do not support a change of D. melanogaster Dscam1 isoform diversity in response to live E. coli. Nevertheless, in future this approach could be used to identify potentially immune-related Dscam1 splicing regulation in other host species or in response to other pathogens.

The gut commensal microbiome of Drosophila melanogaster is modified by the endosymbiont Wolbachia.

Endosymbiotic Wolbachia bacteria and the gut microbiome have independently been shown to affect several aspects of insect biology, including reproduction, development, life span, stem cell activity, and resistance to human pathogens, in insect vectors. This work shows that Wolbachia bacteria, which reside mainly in the fly germline, affect the microbial species present in the fly gut in a lab-reared strain. Drosophila melanogaster hosts two main genera of commensal bacteria-Acetobacter and Lactobacillus. Wolbachia-infected flies have significantly reduced titers of Acetobacter. Sampling of the microbiome of axenic flies fed with equal proportions of both bacteria shows that the presence of Wolbachia bacteria is a significant determinant of the composition of the microbiome throughout fly development. However, this effect is host genotype dependent. To investigate the mechanism of microbiome modulation, the effect of Wolbachia bacteria on Imd and reactive oxygen species pathways, the main regulators of immune response in the fly gut, was measured. The presence of Wolbachia bacteria does not induce significant changes in the expression of the genes for the effector molecules in either pathway. Furthermore, microbiome modulation is not due to direct interaction between Wolbachia bacteria and gut microbes. Confocal analysis shows that Wolbachia bacteria are absent from the gut lumen. These results indicate that the mechanistic basis of the modulation of composition of the microbiome by Wolbachia bacteria is more complex than a direct bacterial interaction or the effect of Wolbachia bacteria on fly immunity. The findings reported here highlight the importance of considering the composition of the gut microbiome and host genetic background during Wolbachia-induced phenotypic studies and when formulating microbe-based disease vector control strategies. IMPORTANCE Wolbachia bacteria are intracellular bacteria present in the microbiome of a large fraction of insects and parasitic nematodes. They can block mosquitos’ ability to transmit several infectious disease-causing pathogens, including Zika, dengue, chikungunya, and West Nile viruses and malaria parasites. Certain extracellular bacteria present in the gut lumen of these insects can also block pathogen transmission. However, our understanding of interactions between Wolbachia and gut bacteria and how they influence each other is limited. Here we show that the presence of Wolbachia strain wMel changes the composition of gut commensal bacteria in the fruit fly. Our findings implicate interactions between bacterial species as a key factor in determining the overall composition of the microbiome and thus reveal new paradigms to consider in the development of disease control strategies.

Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

Long-read single molecule sequencing to resolve tandem gene copies: The Mst77Y region on the Drosophila melanogaster Y chromosome.

The autosomal gene Mst77F of Drosophila melanogaster is essential for male fertility. In 2010, Krsticevic et al. (Genetics 184: 295-307) found 18 Y-linked copies of Mst77F (“Mst77Y”), which collectively account for 20% of the functional Mst77F-like mRNA. The Mst77Y genes were severely misassembled in the then-available genome assembly and were identified by cloning and sequencing polymerase chain reaction products. The genomic structure of the Mst77Y region and the possible existence of additional copies remained unknown. The recent publication of two long-read assemblies of D. melanogaster prompted us to reinvestigate this challenging region of the Y chromosome. We found that the Illumina Synthetic Long Reads assembly failed in the Mst77Y region, most likely because of its tandem duplication structure. The PacBio MHAP assembly of the Mst77Y region seems to be very accurate, as revealed by comparisons with the previously found Mst77Y genes, a bacterial artificial chromosome sequence, and Illumina reads of the same strain. We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length. Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction-induced artifacts. There are several identical copies of some Mst77Y genes, coincidentally bringing the total copy number to 18. Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes. Copyright © 2015 Krsticevic et al.

An adenine code for DNA: A second life for N6-methyladenine.

DNA N6-methyladenine (6mA) protects against restriction enzymes in bacteria. However, isolated reports have suggested additional activities and its presence in other organisms, such as unicellular eukaryotes. New data now find that 6mA may have a gene regulatory function in green alga, worm, and fly, suggesting m6A as a potential “epigenetic” mark. Copyright © 2015 Elsevier Inc. All rights reserved.

Variation and evolution in the glutamine-rich repeat region of Drosophila argonaute-2.

RNA interference pathways mediate biological processes through Argonaute-family proteins, which bind small RNAs as guides to silence complementary target nucleic acids . In insects and crustaceans Argonaute-2 silences viral nucleic acids, and therefore acts as a primary effector of innate antiviral immunity. Although the function of the major Argonaute-2 domains, which are conserved across most Argonaute-family proteins, are known, many invertebrate Argonaute-2 homologs contain a glutamine-rich repeat (GRR) region of unknown function at the N-terminus . Here we combine long-read amplicon sequencing of Drosophila Genetic Reference Panel (DGRP) lines with publicly available sequence data from many insect species to show that this region evolves extremely rapidly and is hyper-variable within species. We identify distinct GRR haplotype groups in Drosophila melanogaster, and suggest that one of these haplotype groups has recently risen to high frequency in a North American population. Finally, we use published data from genome-wide association studies of viral resistance in D. melanogaster to test whether GRR haplotypes are associated with survival after virus challenge. We find a marginally significant association with survival after challenge with Drosophila C Virus in the DGRP, but we were unable to replicate this finding using lines from the Drosophila Synthetic Population Resource panel. Copyright © 2016 Palmer and Obbard.

Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila.

Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typically results in their inaccurate representation in genome assemblies. The presumed testis-specific, chimeric gene Sdic originated, and tandemly expanded in Drosophila melanogaster, contributing to increased male-male competition. Using various types of massively parallel sequencing data, we studied the organization, sequence evolution, and functional attributes of the different Sdic copies. By leveraging long-read sequencing data, we uncovered both copy number and order differences from the currently accepted annotation for the Sdic region. Despite evidence for pervasive gene conversion affecting the Sdic copies, we also detected signatures of two episodes of diversifying selection, which have contributed to the evolution of a variety of C-termini and miRNA binding site compositions. Expression analyses involving RNA-seq datasets from 59 different biological conditions revealed distinctive expression breadths among the copies, with three copies being transcribed in females, opening the possibility to a sexually antagonistic effect. Phenotypic assays using Sdic knock-out strains indicated that should this antagonistic effect exist, it does not compromise female fertility. Our results strongly suggest that the genome consolidation of the Sdic gene cluster is more the result of a quick exploration of different paths of molecular tinkering by different copies than a mere dosage increase, which could be a recurrent evolutionary outcome in the presence of persistent sexual selection. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail:

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes. Published by Cold Spring Harbor Laboratory Press.

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster.

Highly repetitive satellite DNA (satDNA) repeats are found in most eukaryotic genomes. SatDNAs are rapidly evolving and have roles in genome stability and chromosome segregation. Their repetitive nature poses a challenge for genome assembly and makes progress on the detailed study of satDNA structure difficult. Here, we use single-molecule sequencing long reads from Pacific Biosciences (PacBio) to determine the detailed structure of all major autosomal complex satDNA loci in Drosophila melanogaster, with a particular focus on the 260-bp and Responder satellites. We determine the optimal de novo assembly methods and parameter combinations required to produce a high-quality assembly of these previously unassembled satDNA loci and validate this assembly using molecular and computational approaches. We determined that the computationally intensive PBcR-BLASR assembly pipeline yielded better assemblies than the faster and more efficient pipelines based on the MHAP hashing algorithm, and it is essential to validate assemblies of repetitive loci. The assemblies reveal that satDNA repeats are organized into large arrays interrupted by transposable elements. The repeats in the center of the array tend to be homogenized in sequence, suggesting that gene conversion and unequal crossovers lead to repeat homogenization through concerted evolution, although the degree of unequal crossing over may differ among complex satellite loci. We find evidence for higher-order structure within satDNA arrays that suggest recent structural rearrangements. These assemblies provide a platform for the evolutionary and functional genomics of satDNAs in pericentric heterochromatin. © 2017 Khost et al.; Published by Cold Spring Harbor Laboratory Press.

Genome sequence of the Drosophila melanogaster male-killing Spiroplasma strain MSRO endosymbiont.

Spiroplasmas are helical and motile members of a cell wall-less eubacterial group called Mollicutes. Although all spiroplasmas are associated with arthropods, they exhibit great diversity with respect to both their modes of transmission and their effects on their hosts; ranging from horizontally transmitted pathogens and commensals to endosymbionts that are transmitted transovarially (i.e., from mother to offspring). Here we provide the first genome sequence, along with proteomic validation, of an endosymbiotic inherited Spiroplasma bacterium, the Spiroplasma poulsonii MSRO strain harbored by Drosophila melanogaster. Comparison of the genome content of S. poulsonii with that of horizontally transmitted spiroplasmas indicates that S. poulsonii has lost many metabolic pathways and transporters, demonstrating a high level of interdependence with its insect host. Consistent with genome analysis, experimental studies showed that S. poulsonii metabolizes glucose but not trehalose. Notably, trehalose is more abundant than glucose in Drosophila hemolymph, and the inability to metabolize trehalose may prevent S. poulsonii from overproliferating. Our study identifies putative virulence genes, notably, those for a chitinase, the H2O2-producing glycerol-3-phosphate oxidase, and enzymes involved in the synthesis of the eukaryote-toxic lipid cardiolipin. S. poulsonii also expresses on the cell membrane one functional adhesion-related protein and two divergent spiralin proteins that have been implicated in insect cell invasion in other spiroplasmas. These lipoproteins may be involved in the colonization of the Drosophila germ line, ensuring S. poulsonii vertical transmission. The S. poulsonii genome is a valuable resource to explore the mechanisms of male killing and symbiont-mediated protection, two cardinal features of many facultative endosymbionts.Most insect species, including important disease vectors and crop pests, harbor vertically transmitted endosymbiotic bacteria. These endosymbionts play key roles in their hosts’ fitness, including protecting them against natural enemies and manipulating their reproduction in ways that increase the frequency of symbiont infection. Little is known about the molecular mechanisms that underlie these processes. Here, we provide the first genome draft of a vertically transmitted male-killing Spiroplasma bacterium, the S. poulsonii MSRO strain harbored by D. melanogaster. Analysis of the S. poulsonii genome was complemented by proteomics and ex vivo metabolic experiments. Our results indicate that S. poulsonii has reduced metabolic capabilities and expresses divergent membrane lipoproteins and potential virulence factors that likely participate in Spiroplasma-host interactions. This work fills a gap in our knowledge of insect endosymbionts and provides tools with which to decipher the interaction between Spiroplasma bacteria and their well-characterized host D. melanogaster, which is emerging as a model of endosymbiosis. Copyright © 2015 Paredes et al.

Unique transposon landscapes are pervasive across Drosophila melanogaster genomes.

To understand how transposon landscapes (TLs) vary across animal genomes, we describe a new method called the Transposon Insertion and Depletion AnaLyzer (TIDAL) and a database of >300 TLs in Drosophila melanogaster (TIDAL-Fly). Our analysis reveals pervasive TL diversity across cell lines and fly strains, even for identically named sub-strains from different laboratories such as the ISO1 strain used for the reference genome sequence. On average, >500 novel insertions exist in every lab strain, inbred strains of the Drosophila Genetic Reference Panel (DGRP), and fly isolates in the Drosophila Genome Nexus (DGN). A minority (<25%) of transposon families comprise the majority (>70%) of TL diversity across fly strains. A sharp contrast between insertion and depletion patterns indicates that many transposons are unique to the ISO1 reference genome sequence. Although TL diversity from fly strains reaches asymptotic limits with increasing sequencing depth, rampant TL diversity causes unsaturated detection of TLs in pools of flies. Finally, we show novel transposon insertions negatively correlate with Piwi-interacting RNA (piRNA) levels for most transposon families, except for the highly-abundant roo retrotransposon. Our study provides a useful resource for Drosophila geneticists to understand how transposons create extensive genomic diversity in fly cell lines and strains.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Sequence alignment tools: one parallel pattern to rule them all?

In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences.

Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed.LoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences.LoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at

Convergent evolution of Y chromosome gene content in flies.

Sex-chromosomes have formed repeatedly across Diptera from ordinary autosomes, and X-chromosomes mostly conserve their ancestral genes. Y-chromosomes are characterized by abundant gene-loss and an accumulation of repetitive DNA, yet the nature of the gene repertoire of fly Y-chromosomes is largely unknown. Here we trace gene-content evolution of Y-chromosomes across 22 Diptera species, using a subtraction pipeline that infers Y genes from male and female genome, and transcriptome data. Few genes remain on old Y-chromosomes, but the number of inferred Y-genes varies substantially between species. Young Y-chromosomes still show clear evidence of their autosomal origins, but most genes on old Y-chromosomes are not simply remnants of genes originally present on the proto-sex-chromosome that escaped degeneration, but instead were recruited secondarily from autosomes. Despite almost no overlap in Y-linked gene content in different species with independently formed sex-chromosomes, we find that Y-linked genes have evolved convergent gene functions associated with testis expression. Thus, male-specific selection appears as a dominant force shaping gene-content evolution of Y-chromosomes across fly species.While X-chromosome gene content tends to be conserved, Y-chromosome evolution is dynamic and difficult to reconstruct. Here, Mahajan and Bachtrog use a subtraction pipeline to identify Y-linked genes in 22 Diptera species, revealing patterns of Y-chromosome gene-content evolution.

