In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.
Long-read RNA sequencing (RNA-seq) is promising to transcriptomics studies, however, the alignment of the reads is still a fundamental but non-trivial task due to the sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass long RNA-seq read alignment approach, which constructs graph-based alignment skeletons to sensitively infer exons, and use them to generate spliced reference sequence to produce refined alignments. deSALT addresses several difficult issues, such as small exons, serious sequencing errors and consensus spliced alignment. Benchmarks demonstrate that this approach has a better ability to produce high-quality full-length alignments, which has enormous potentials to transcriptomics studies.
Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: (1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and (2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males vs. females; using Y chromosome assemblies or FIuorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59?kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Motivation: Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results: In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.
Draft Genome Sequence of the Novonestmycin-Producing Strain Streptomyces sp. Z26, Isolated from Potato Rhizosphere in Morocco.
Streptomyces sp. strain Z26 exhibited antifungal activity and turned out to be a producer of the secondary metabolites novonestmycin A and B. The 6.5-Mb draft genome gives insight into the complete secondary metabolite production capacity and builds the basis to find and locate the biosynthetic gene cluster encoding the novonestmycins.
In insects, rapidly evolving primary sex-determining signals are transduced by a conserved regulatory module controlling sexual differentiation. In the agricultural pest Ceratitis capitata (Mediterranean fruit fly, or Medfly), we identified a Y-linked gene, Maleness-on-the-Y (MoY), encoding a small protein that is necessary and sufficient for male development. Silencing or disruption of MoY in XY embryos causes feminization, whereas overexpression of MoY in XX embryos induces masculinization. Crosses between transformed XY females and XX males give rise to males and females, indicating that a Y chromosome can be transmitted by XY females. MoY is Y-linked and functionally conserved in other species of the Tephritidae family, highlighting its potential to serve as a tool for developing more effective control strategies against these major agricultural insect pests.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Symbiosis is a major force of evolutionary change, influencing virtually all aspects of biology, from population ecology and evolution to genomics and molecular/biochemical mechanisms of development and reproduction. A remarkable example is Wolbachia endobacteria, present in some parasitic nematodes and many arthropod species. Acquisition of genomic data from diverse Wolbachia clades will aid in the elucidation of the different symbiotic mechanisms(s). However, challenges of de novo assembly of Wolbachia genomes include the presence in the sample of host DNA: nematode/vertebrate or insect. We designed biotinylated probes to capture large fragments of Wolbachia DNA for sequencing using PacBio technology (LEFT-SEQ: Large Enriched Fragment Targeted Sequencing). LEFT-SEQ was used to capture and sequence four Wolbachia genomes: the filarial nematode Brugia malayi, wBm, (21-fold enrichment), Drosophila mauritiana flies (2 isolates), wMau (11-fold enrichment), and Aedes albopictus mosquitoes, wAlbB (200-fold enrichment). LEFT-SEQ resulted in complete genomes for wBm and for wMau. For wBm, 18 single-nucleotide polymorphisms (SNPs), relative to the wBm reference, were identified and confirmed by PCR. A limit of LEFT-SEQ is illustrated by the wAlbB genome, characterized by a very high level of insertion sequences elements (ISs) and DNA repeats, for which only a 20-contig draft assembly was achieved.
A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance.
The codling moth Cydia pomonella, a major invasive pest of pome fruit, has spread around the globe in the last half century. We generated a chromosome-level scaffold assembly including the Z chromosome and a portion of the W chromosome. This assembly reveals the duplication of an olfactory receptor gene (OR3), which we demonstrate enhances the ability of C. pomonella to exploit kairomones and pheromones in locating both host plants and mates. Genome-wide association studies contrasting insecticide-resistant and susceptible strains identify hundreds of single nucleotide polymorphisms (SNPs) potentially associated with insecticide resistance, including three SNPs found in the promoter of CYP6B2. RNAi knockdown of CYP6B2 increases C. pomonella sensitivity to two insecticides, deltamethrin and azinphos methyl. The high-quality genome assembly of C. pomonella informs the genetic basis of its invasiveness, suggesting the codling moth has distinctive capabilities and adaptive potential that may explain its worldwide expansion.
Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities. © The Author(s) 2019.
Long-read sequencing is emerging as a promising sequencing technology because it can tackle the short length limitation of second-generation sequencing, which has dominated the sequencing market in past years. However, it has substantially higher error rates compared to short-read sequencing (e.g., 13% vs. 0.1%), and its sequencing cost per base is typically more expensive than that of short-read sequencing. To address these limitations, we present a distributed hybrid error correction framework, called ParLECH, that is scalable and cost-efficient for PacBio long reads. For correcting the errors in the long reads, ParLECH utilizes the Illumina short reads that have the low error rate with high coverage at low cost. To efficiently analyze the high-throughput Illumina short reads, ParLECH is equipped with Hadoop and a distributed NoSQL system. To further improve the accuracy, ParLECH utilizes the k-mer coverage information of the Illumina short reads. Specifically, we develop a distributed version of the widest path algorithm, which maximizes the minimum k-mer coverage in a path of the de Bruijn graph constructed from the Illumina short reads. We replace an error region in a long read with its corresponding widest path. Our experimental results show that ParLECH can handle large-scale real-world datasets in a scalable and accurate manner. Using ParLECH, we can process a 312 GB human genome PacBio dataset, with a 452 GB Illumina dataset, on 128 nodes in less than 29 hours.
Identification and expression analysis of chemosensory genes in the citrus fruit fly Bactrocera (Tetradacus) minax
The citrus fruit fly Bactrocera (Tetradacus) minax is a major and devastating agricultural pest in Asian subtropical countries. Previous studies have shown that B. minax interacts with hosts via an efficient chemosensory system. However, knowledge regarding the molecular components of the B. minax chemosensory system has not yet been well established. Herein, based on our newly generated whole-genome dataset for B. minax and by comparison with the characterized genomes of 6 other fruit fly species, we identified, for the first time, a total of 25 putative odorant-binding receptors (OBPs), 4 single-copy chemosensory proteins (CSPs) and 53 candidate odorant receptors (ORs). To further survey the expression of these candidate genes, the transcriptomes from three developmental stages (larvae, pupae and adults) of B. minax and Bactrocera dorsalis were analyzed. We found that 1) at the adult developmental stage, there were 14 highly expressed OBPs (FPKM>100) in B. dorsalis and 7 highly expressed OBPs in B. minax; 2) the expression of CSP3 and CSP4 in adult B. dorsalis was higher than that in B. minax; and 3) most of the OR genes exhibited low expression at the three developmental stages in both species. This study on the identification of the chemosensory system of B. minax not only enriches the existing research on insect olfactory receptors but also provides new targets for preventative control and ecological regulation of B. minax in the future.