October 23, 2019  |  

Identification and expression analysis of chemosensory genes in the citrus fruit fly Bactrocera (Tetradacus) minax

The citrus fruit fly Bactrocera (Tetradacus) minax is a major and devastating agricultural pest in Asian subtropical countries. Previous studies have shown that B. minax interacts with hosts via an efficient chemosensory system. However, knowledge regarding the molecular components of the B. minax chemosensory system has not yet been well established. Herein, based on our newly generated whole-genome dataset for B. minax and by comparison with the characterized genomes of 6 other fruit fly species, we identified, for the first time, a total of 25 putative odorant-binding receptors (OBPs), 4 single-copy chemosensory proteins (CSPs) and 53 candidate odorant receptors (ORs). To further survey the expression of these candidate genes, the transcriptomes from three developmental stages (larvae, pupae and adults) of B. minax and Bactrocera dorsalis were analyzed. We found that 1) at the adult developmental stage, there were 14 highly expressed OBPs (FPKM>100) in B. dorsalis and 7 highly expressed OBPs in B. minax; 2) the expression of CSP3 and CSP4 in adult B. dorsalis was higher than that in B. minax; and 3) most of the OR genes exhibited low expression at the three developmental stages in both species. This study on the identification of the chemosensory system of B. minax not only enriches the existing research on insect olfactory receptors but also provides new targets for preventative control and ecological regulation of B. minax in the future.


October 23, 2019  |  

An online bioinformatics tool predicts zinc finger and TALE nuclease off-target cleavage.

Although engineered nucleases can efficiently cleave intracellular DNA at desired target sites, major concerns remain on potential ‘off-target’ cleavage that may occur throughout the genome. We developed an online tool: predicted report of genome-wide nuclease off-target sites (PROGNOS) that effectively identifies off-target sites. The initial bioinformatics algorithms in PROGNOS were validated by predicting 44 of 65 previously confirmed off-target sites, and by uncovering a new off-target site for the extensively studied zinc finger nucleases (ZFNs) targeting C-C chemokine receptor type 5. Using PROGNOS, we rapidly interrogated 128 potential off-target sites for newly designed transcription activator-like effector nucleases containing either Asn-Asn (NN) or Asn-Lys (NK) repeat variable di-residues (RVDs) and 3- and 4-finger ZFNs, and validated 13 bona fide off-target sites for these nucleases by DNA sequencing. The PROGNOS algorithms were further refined by incorporating additional features of nuclease-DNA interactions and the newly confirmed off-target sites into the training set, which increased the percentage of bona fide off-target sites found within the top PROGNOS rankings. By identifying potential off-target sites in silico, PROGNOS allows the selection of more specific target sites and aids the identification of bona fide off-target sites, significantly facilitating the design of engineered nucleases for genome editing applications.


September 22, 2019  |  

Leveraging multiple transcriptome assembly methods for improved gene structure annotation.

The performance of RNA sequencing (RNA-seq) aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand.Here, we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artifacts such as erroneous transcript chimerisms.We have implemented this method in an open-source Python3 and Cython program, Mikado, available on GitHub.


September 22, 2019  |  

The genome of an underwater architect, the caddisfly Stenopsyche tienmushanensis Hwang (Insecta: Trichoptera).

Caddisflies (Insecta: Trichoptera) are a highly adapted freshwater group of insects split from a common ancestor with Lepidoptera. They are the most diverse (>16,000 species) of the strictly aquatic insect orders and are widely employed as bio-indicators in water quality assessment and monitoring. Among the numerous adaptations to aquatic habitats, caddisfly larvae use silk and materials from the environment (e.g., stones, sticks, leaf matter) to build composite structures such as fixed retreats and portable cases. Understanding how caddisflies have adapted to aquatic habitats will help explain the evolution and subsequent diversification of the group.We sequenced a retreat-builder caddisfly Stenopsyche tienmushanensis Hwang and assembled a high-quality genome from both Illumina and Pacific Biosciences (PacBio) sequencing. In total, 601.2 M Illumina reads (90.2 Gb) and 16.9 M PacBio subreads (89.0 Gb) were generated. The 451.5 Mb assembled genome has a contig N50 of 1.29 M, has a longest contig of 4.76 Mb, and covers 97.65% of the 1,658 insect single-copy genes as assessed by Benchmarking Universal Single-Copy Orthologs. The genome comprises 36.76% repetitive elements. A total of 14,672 predicted protein-coding genes were identified. The genome revealed gene expansions in specific groups of the cytochrome P450 family and olfactory binding proteins, suggesting potential genomic features associated with pollutant tolerance and mate finding. In addition, the complete gene complex of the highly repetitive H-fibroin, the major protein component of caddisfly larval silk, was assembled.We report the draft genome of Stenopsyche tienmushanensis, the highest-quality caddisfly genome so far. The genome information will be an important resource for the study of caddisflies and may shed light on the evolution of aquatic insects.


September 22, 2019  |  

Computational identification of novel genes: current and future perspectives.

While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.


September 22, 2019  |  

wtf genes are prolific dual poison-antidote meiotic drivers.

Meiotic drivers are selfish genes that bias their transmission into gametes, defying Mendelian inheritance. Despite the significant impact of these genomic parasites on evolution and infertility, few meiotic drive loci have been identified or mechanistically characterized. Here, we demonstrate a complex landscape of meiotic drive genes on chromosome 3 of the fission yeasts Schizosaccharomyces kambucha and S. pombe. We identify S. kambucha wtf4 as one of these genes that acts to kill gametes (known as spores in yeast) that do not inherit the gene from heterozygotes. wtf4 utilizes dual, overlapping transcripts to encode both a gamete-killing poison and an antidote to the poison. To enact drive, all gametes are poisoned, whereas only those that inherit wtf4 are rescued by the antidote. Our work suggests that the wtf multigene family proliferated due to meiotic drive and highlights the power of selfish genes to shape genomes, even while imposing tremendous costs to fertility.


September 22, 2019  |  

CRISPR/Cas9 deletions in a conserved exon of Distal-less generates gains and losses in a recently acquired morphological novelty in flies.

Distal-less has been repeatedly co-opted for the development of many novel traits. Here, we document its curious role in the development of a novel abdominal appendage (“sternite brushes”) in sepsid flies. CRISPR/Cas9 deletions in the homeodomain result in losses of sternite brushes, demonstrating that Distal-less is necessary for their development. However, deletions in the upstream coding exon (Exon 2) produce losses or gains of brushes. A dissection of Exon 2 reveals that the likely mechanism for gains involves a deletion in an exon-splicing enhancer site that leads to exon skipping. Such contradictory phenotypes are also observed in butterflies, suggesting that mutations in the conserved upstream regions have the potential to generate phenotypic variability in insects that diverged 300 million years ago. Our results demonstrate the importance of Distal-less for the development of a novel abdominal appendage in insects and highlight how site-specific mutations in the same exon can produce contradictory phenotypes. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.


September 22, 2019  |  

Introduction to isoform sequencing using Pacific Biosciences technology (Iso-Seq)

Alternative RNA splicing is a known phenomenon, but we still do not have a complete catalog of isoforms that explain variability in the human transcriptome. We have made significant progress in developing methods to study variability of the transcriptome, but we are far away of having a complete picture of the transcriptome. The initial methods to study gene expression were based on cloning of cDNAs and Sanger sequencing. The strategy was labor-intensive and expensive. With the development of microarrays, different methods based on exon arrays and tiling arrays provided valuable information about RNA expression. However, the microarray presented significant limitations. Most of the limitations became apparent by 2005, but it was not until 2008 that an alternative method to study the transcriptome was developed. RNA Sequencing using next-generation sequencing (RNA-Seq) quickly became the technology of choice for gene expression profiling. Recently, the precision and sensitivity of RNA-Seq have come into question, especially for transcriptome reconstruction. This chapter will describe a relatively new method, “Isoform Sequencing (Iso-Seq). Iso-Seq was developed by Pacific Biosciences (PacBio), and it is capable of identifying new isoforms with extraordinary precision due to its long-read technology. The technique to create libraries is straightforward, and the PacBio RS II instrument generates the information in hours. The bioinformatics analysis is performed using the freely available SMRT® Portal software. The SMRT Portal is easy to use and capable of performing all the steps necessary to analyze the raw data and to generate high-quality full-length isoforms. For the universal acceptance of the Iso-Seq method, the capacity of the SMRT Cells needs to improve at least 10- to 100-fold to make the system affordable and attractive to users.


September 22, 2019  |  

The industrial melanism mutation in British peppered moths is a transposable element.

Discovering the mutational events that fuel adaptation to environmental change remains an important challenge for evolutionary biology. The classroom example of a visible evolutionary response is industrial melanism in the peppered moth (Biston betularia): the replacement, during the Industrial Revolution, of the common pale typica form by a previously unknown black (carbonaria) form, driven by the interaction between bird predation and coal pollution. The carbonaria locus has been coarsely localized to a 200-kilobase region, but the specific identity and nature of the sequence difference controlling the carbonaria-typica polymorphism, and the gene it influences, are unknown. Here we show that the mutation event giving rise to industrial melanism in Britain was the insertion of a large, tandemly repeated, transposable element into the first intron of the gene cortex. Statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred around 1819, consistent with the historical record. We have begun to dissect the mode of action of the carbonaria transposable element by showing that it increases the abundance of a cortex transcript, the protein product of which plays an important role in cell-cycle regulation, during early wing disc development. Our findings fill a substantial knowledge gap in the iconic example of microevolutionary change, adding a further layer of insight into the mechanism of adaptation in response to natural selection. The discovery that the mutation itself is a transposable element will stimulate further debate about the importance of ‘jumping genes’ as a source of major phenotypic novelty.


September 22, 2019  |  

Long-read based assembly and annotation of a Drosophila simulans genome

Long-read sequencing technologies enable high-quality, contiguous genome assemblies. Here we used SMRT sequencing to assemble the genome of a Drosophila simulans strain originating from Madagascar, the ancestral range of the species. We generated 8 Gb of raw data (~50x coverage) with a mean read length of 6,410 bp, a NR50 of 9,125 bp and the longest subread at 49 kb. We benchmarked six different assemblers and merged the best two assemblies from Canu and Falcon. Our final assembly was 127.41 Mb with a N50 of 5.38 Mb and 305 contigs. We anchored more than 4 Mb of novel sequence to the major chromosome arms, and significantly improved the assembly of peri-centromeric and telomeric regions. Finally, we performed full-length transcript sequencing and used this data in conjunction with short-read RNAseq data to annotate 13,422 genes in the genome, improving the annotation in regions with complex, nested gene structures.


September 22, 2019  |  

Evaluation of tools for long read RNA-seq splice-aware alignment.

High-throughput sequencing has transformed the study of gene expression levels through RNA-seq, a technique that is now routinely used by various fields, such as genetic research or diagnostics. The advent of third generation sequencing technologies providing significantly longer reads opens up new possibilities. However, the high error rates common to these technologies set new bioinformatics challenges for the gapped alignment of reads to their genomic origin. In this study, we have explored how currently available RNA-seq splice-aware alignment tools cope with increased read lengths and error rates. All tested tools were initially developed for short NGS reads, but some have claimed support for long Pacific Biosciences (PacBio) or even Oxford Nanopore Technologies (ONT) MinION reads.The tools were tested on synthetic and real datasets from two technologies (PacBio and ONT MinION). Alignment quality and resource usage were compared across different aligners. The effect of error correction of long reads was explored, both using self-correction and correction with an external short reads dataset. A tool was developed for evaluating RNA-seq alignment results. This tool can be used to compare the alignment of simulated reads to their genomic origin, or to compare the alignment of real reads to a set of annotated transcripts. Our tests show that while some RNA-seq aligners were unable to cope with long error-prone reads, others produced overall good results. We further show that alignment accuracy can be improved using error-corrected reads.https://github.com/kkrizanovic/RNAseqEval, https://figshare.com/projects/RNAseq_benchmark/24391.mile.sikic@fer.hr.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


September 22, 2019  |  

Comprehensive transcriptome analysis of Sarcophaga peregrina, a forensically important fly species.

Sarcophaga peregrina (flesh fly) is a frequently found fly species in Palaearctic, Oriental, and Australasian regions that can be used to estimate minimal postmortem intervals important for forensic investigations. Despite its forensic importance, the genome information of S. peregrina has not been fully described. Therefore, we generated a comprehensive gene expression dataset using RNA sequencing and carried out de novo assembly to characterize the S. peregrina transcriptome. We obtained precise sequence information for RNA transcripts using two different methods. Based on primary sequence information, we identified sets of assembled unigenes and predicted coding sequences. Functional annotation of the aligned unigenes was performed using the UniProt, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes databases. As a result, 26,580,352 and 83,221 raw reads were obtained using the Illumina MiSeq and Pacbio RS II Iso-Seq sequencing applications, respectively. From these reads, 55,730 contigs were successfully annotated. The present study provides the resulting genome information of S. peregrina, which is valuable for forensic applications.


September 22, 2019  |  

The state of play in higher eukaryote gene annotation.

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe – or ‘annotate’ – genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists – from clinicians to evolutionary biologists – need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.


September 22, 2019  |  

Quantitative profiling of Drosophila melanogaster Dscam1 isoforms reveals no changes in splicing after bacterial exposure.

The hypervariable Dscam1 (Down syndrome cell adhesion molecule 1) gene can produce thousands of different ectodomain isoforms via mutually exclusive alternative splicing. Dscam1 appears to be involved in the immune response of some insects and crustaceans. It has been proposed that the diverse isoforms may be involved in the recognition of, or the defence against, diverse parasite epitopes, although evidence to support this is sparse. A prediction that can be generated from this hypothesis is that the gene expression of specific exons and/or isoforms is influenced by exposure to an immune elicitor. To test this hypothesis, we for the first time, use a long read RNA sequencing method to directly investigate the Dscam1 splicing pattern after exposing adult Drosophila melanogaster and a S2 cell line to live Escherichia coli. After bacterial exposure both models showed increased expression of immune-related genes, indicating that the immune system had been activated. However there were no changes in total Dscam1 mRNA expression. RNA sequencing further showed that there were no significant changes in individual exon expression and no changes in isoform splicing patterns in response to bacterial exposure. Therefore our studies do not support a change of D. melanogaster Dscam1 isoform diversity in response to live E. coli. Nevertheless, in future this approach could be used to identify potentially immune-related Dscam1 splicing regulation in other host species or in response to other pathogens.


September 22, 2019  |  

PacBio sequencing and its applications.

Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.