Structural variant detection with low-coverage PacBio sequencing

Structural variants (genomic differences =50 base pairs) contribute to the evolution of organisms traits and human disease. Most structural variants (SVs) are too small to detect with array comparative genomic hybridization but too large to reliably discover with short-read DNA sequencing. Recent studies in human genomes show that PacBio SMRT Sequencing sensitively detects structural variants.

Beyond Contiguity: Evaluating the accuracy of de novo genome assemblies

HiFi reads (>99% accurate, 15-20 kb) from the PacBio Sequel II System consistently provide complete and contiguous genome assemblies. In addition to completeness and contiguity, accuracy is of critical importance, as assembly errors complicate downstream analysis, particularly by disrupting gene frames. Metrics used to assess assembly accuracy include: 1) in-frame gene count, 2) kmer consistency, and 3) concordance to a benchmark, where discordances are interpreted as assembly errors. Genome in a Bottle (GIAB) provides a benchmark for the human genome with estimated accuracy of 99.9999% (Q60). Concordance for human HiFi assemblies exceeds Q50, which provides excellent genomes for downstream analysis, but presents a challenge that any new benchmark must significantly exceed Q50 or the discordance will represent the error rate of the benchmark. To establish benchmarks for Oryza sativa and Drosophila melanogaster, we collected draft references, Illumina short reads, and PacBio HiFi reads. By species, the benchmark was defined as regions of normal coverage that are not within 5 bp of a small variant or 50 bp of a structural variant. For both species, the benchmark regions span around 60% of the genome and HiFi assemblies achieve Q50 accuracy, which is notably more accurate than assemblies with other technologies and meets typical standards for a finished, reference-grade assembly. Here we present a protocol to generate benchmarks for any sample that rival the GIAB benchmark in accuracy. These benchmarks allow the comparison and improvement of genome assemblies and highlight the superior accuracy of assemblies generated with PacBio HiFi reads.

Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling

Genome rearrangements that occur during evolution impose major challenges on regulatory mechanisms that rely on three-dimensional genome architecture. Here, we developed a scaffolding algorithm and generated chromosome-length assemblies from Hi-C data for studying genome topology in three distantly related Drosophila species. We observe extensive genome shuffling between these species with one synteny breakpoint after approximately every six genes. A/B compartments, a set of large gene-dense topologically associating domains (TADs) and spatial contacts between high-affinity sites (HAS) located on the X chromosome are maintained over 40 million years, indicating architectural conservation at various hierarchies. Evolutionary conserved genes cluster in the vicinity of HAS, while HAS locations appear evolutionarily flexible, thus uncoupling functional requirement of dosage compensation from individual positions on the linear X chromosome. Therefore, 3D architecture is preserved even in scenarios of thousands of rearrangements highlighting its relevance for essential processes such as dosage compensation of the X chromosome.

Diversification and collapse of a telomere elongation mechanism.

In most eukaryotes, telomerase counteracts chromosome erosion by adding repetitive sequence to terminal ends. Drosophila melanogaster instead relies on specialized retrotransposons that insert exclusively at telomeres. This exchange of goods between host and mobile element-wherein the mobile element provides an essential genome service and the host provides a hospitable niche for mobile element propagation-has been called a “genomic symbiosis.” However, these telomere-specialized, jockey family retrotransposons may actually evolve to “selfishly” overreplicate in the genomes that they ostensibly serve. Under this model, we expect rapid diversification of telomere-specialized retrotransposon lineages and, possibly, the breakdown of this ostensibly symbiotic relationship. Here we report data consistent with both predictions. Searching the raw reads of the 15-Myr-old melanogaster species group, we generated de novo jockey retrotransposon consensus sequences and used phylogenetic tree-building to delineate four distinct telomere-associated lineages. Recurrent gains, losses, and replacements account for this retrotransposon lineage diversity. In Drosophila biarmipes, telomere-specialized elements have disappeared completely. De novo assembly of long reads and cytogenetics confirmed this species-specific collapse of retrotransposon-dependent telomere elongation. Instead, telomere-restricted satellite DNA and DNA transposon fragments occupy its terminal ends. We infer that D. biarmipes relies instead on a recombination-based mechanism conserved from yeast to flies to humans. Telomeric retrotransposon diversification and disappearance suggest that persistently “selfish” machinery shapes telomere elongation across Drosophila rather than completely domesticated, symbiotic mobile elements. © 2019 Saint-Leandre et al.; Published by Cold Spring Harbor Laboratory Press.

Toxin and genome evolution in a Drosophila defensive symbiosis.

Defenses conferred by microbial symbionts play a vital role in the health and fitness of their animal hosts. An important outstanding question in the study of defensive symbiosis is what determines long term stability and effectiveness against diverse natural enemies. In this study, we combine genome and transcriptome sequencing, symbiont transfection and parasite protection experiments, and toxin activity assays to examine the evolution of the defensive symbiosis between Drosophila flies and their vertically transmitted Spiroplasma bacterial symbionts, focusing in particular on ribosome-inactivating proteins (RIPs), symbiont-encoded toxins that have been implicated in protection against both parasitic wasps and nematodes. Although many strains of Spiroplasma, including the male-killing symbiont (sMel) of Drosophila melanogaster, protect against parasitic wasps, only the strain (sNeo) that infects the mycophagous fly Drosophila neotestacea appears to protect against parasitic nematodes. We find that RIP repertoire is a major differentiating factor between strains that do and do not offer nematode protection, and that sMel RIPs do not show activity against nematode ribosomes in vivo. We also discovered a strain of Spiroplasma infecting a mycophagous phorid fly, Megaselia nigra. Although both the host and its Spiroplasma are distantly related to D. neotestacea and its symbiont, genome sequencing revealed that the M. nigra symbiont encodes abundant and diverse RIPs, including plasmid-encoded toxins that are closely related to the RIPs in sNeo. Our results suggest that distantly related Spiroplasma RIP toxins may perform specialized functions with regard to parasite specificity and suggest an important role for horizontal gene transfer in the emergence of novel defensive phenotypes.

The gut commensal microbiome of Drosophila melanogaster is modified by the endosymbiont Wolbachia.

Endosymbiotic Wolbachia bacteria and the gut microbiome have independently been shown to affect several aspects of insect biology, including reproduction, development, life span, stem cell activity, and resistance to human pathogens, in insect vectors. This work shows that Wolbachia bacteria, which reside mainly in the fly germline, affect the microbial species present in the fly gut in a lab-reared strain. Drosophila melanogaster hosts two main genera of commensal bacteria-Acetobacter and Lactobacillus. Wolbachia-infected flies have significantly reduced titers of Acetobacter. Sampling of the microbiome of axenic flies fed with equal proportions of both bacteria shows that the presence of Wolbachia bacteria is a significant determinant of the composition of the microbiome throughout fly development. However, this effect is host genotype dependent. To investigate the mechanism of microbiome modulation, the effect of Wolbachia bacteria on Imd and reactive oxygen species pathways, the main regulators of immune response in the fly gut, was measured. The presence of Wolbachia bacteria does not induce significant changes in the expression of the genes for the effector molecules in either pathway. Furthermore, microbiome modulation is not due to direct interaction between Wolbachia bacteria and gut microbes. Confocal analysis shows that Wolbachia bacteria are absent from the gut lumen. These results indicate that the mechanistic basis of the modulation of composition of the microbiome by Wolbachia bacteria is more complex than a direct bacterial interaction or the effect of Wolbachia bacteria on fly immunity. The findings reported here highlight the importance of considering the composition of the gut microbiome and host genetic background during Wolbachia-induced phenotypic studies and when formulating microbe-based disease vector control strategies. IMPORTANCE Wolbachia bacteria are intracellular bacteria present in the microbiome of a large fraction of insects and parasitic nematodes. They can block mosquitos’ ability to transmit several infectious disease-causing pathogens, including Zika, dengue, chikungunya, and West Nile viruses and malaria parasites. Certain extracellular bacteria present in the gut lumen of these insects can also block pathogen transmission. However, our understanding of interactions between Wolbachia and gut bacteria and how they influence each other is limited. Here we show that the presence of Wolbachia strain wMel changes the composition of gut commensal bacteria in the fruit fly. Our findings implicate interactions between bacterial species as a key factor in determining the overall composition of the microbiome and thus reveal new paradigms to consider in the development of disease control strategies.

Extensive exchange of transposable elements in the Drosophila pseudoobscura group.

As species diverge, so does their transposable element (TE) content. Within a genome, TE families may eventually become dormant due to host-silencing mechanisms, natural selection and the accumulation of inactive copies. The transmission of active copies from a TE families, both vertically and horizontally between species, can allow TEs to escape inactivation if it occurs often enough, as it may allow TEs to temporarily escape silencing in a new host. Thus, the contribution of horizontal exchange to TE persistence has been of increasing interest.Here, we annotated TEs in five species with sequenced genomes from the D. pseudoobscura species group, and curated a set of TE families found in these species. We found that, compared to host genes, many TE families showed lower neutral divergence between species, consistent with recent transmission of TEs between species. Despite these transfers, there are differences in the TE content between species in the group.The TE content is highly dynamic in the D. pseudoobscura species group, frequently transferring between species, keeping TEs active. This result highlights how frequently transposable elements are transmitted between sympatric species and, despite these transfers, how rapidly species TE content can diverge.

De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture.

While short-read sequencing technology has resulted in a sharp increase in the number of species with genome assemblies, these assemblies are typically highly fragmented. Repeats pose the largest challenge for reference genome assembly, and pericentromeric regions and the repeat-rich Y chromosome are typically ignored from sequencing projects. Here, we assemble the genome of Drosophila miranda using long reads for contig formation, chromatin interaction maps for scaffolding and short reads, and optical mapping and bacterial artificial chromosome (BAC) clone sequencing for consensus validation. Our assembly recovers entire chromosomes and contains large fractions of repetitive DNA, including about 41.5 Mb of pericentromeric and telomeric regions, and >100 Mb of the recently formed highly repetitive neo-Y chromosome. While Y chromosome evolution is typically characterized by global sequence loss and shrinkage, the neo-Y increased in size by almost 3-fold because of the accumulation of repetitive sequences. Our high-quality assembly allows us to reconstruct the chromosomal events that have led to the unusual sex chromosome karyotype in D. miranda, including the independent de novo formation of a pair of sex chromosomes at two distinct time points, or the reversion of a former Y chromosome to an autosome.

Genomic structural variations within five continental populations of Drosophila melanogaster.

Chromosomal structural variations (SV) including insertions, deletions, inversions, and translocations occur within the genome and can have a significant effect on organismal phenotype. Some of these effects are caused by structural variations containing genes. Large structural variations represent a significant amount of the genetic diversity within a population. We used a global sampling of Drosophila melanogaster (Ithaca, Zimbabwe, Beijing, Tasmania, and Netherlands) to represent diverse populations within the species. We used long-read sequencing and optical mapping technologies to identify SVs in these genomes. Among the five lines examined, we found an average of 2,928 structural variants within these genomes. These structural variations varied greatly in size and location, included many exonic regions, and could impact adaptation and genomic evolution. Copyright © 2018 Long et al.

Retrotransposons are the major contributors to the expansion of the Drosophila ananassae Muller F element.

The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (~5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5′ ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains. Copyright © 2017 Leung et al.

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.

Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to “phase 3 finished” status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides “lift-over” co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.

