Menu
July 7, 2019

LoRDEC: accurate and efficient long read error correction.

PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec. © The Author 2014. Published by Oxford University Press.


July 7, 2019

Sequence alignment tools: one parallel pattern to rule them all?

In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.


July 7, 2019

Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology.

The largest known DNA viruses infect Acanthamoeba and belong to two markedly different families. The Megaviridae exhibit pseudo-icosahedral virions up to 0.7 µm in diameter and adenine-thymine (AT)-rich genomes of up to 1.25 Mb encoding a thousand proteins. Like their Mimivirus prototype discovered 10 y ago, they entirely replicate within cytoplasmic virion factories. In contrast, the recently discovered Pandoraviruses exhibit larger amphora-shaped virions 1 µm in length and guanine-cytosine-rich genomes up to 2.8 Mb long encoding up to 2,500 proteins. Their replication involves the host nucleus. Whereas the Megaviridae share some general features with the previously described icosahedral large DNA viruses, the Pandoraviruses appear unrelated to them. Here we report the discovery of a third type of giant virus combining an even larger pandoravirus-like particle 1.5 µm in length with a surprisingly smaller 600 kb AT-rich genome, a gene content more similar to Iridoviruses and Marseillevirus, and a fully cytoplasmic replication reminiscent of the Megaviridae. This suggests that pandoravirus-like particles may be associated with a variety of virus families more diverse than previously envisioned. This giant virus, named Pithovirus sibericum, was isolated from a >30,000-y-old radiocarbon-dated sample when we initiated a survey of the virome of Siberian permafrost. The revival of such an ancestral amoeba-infecting virus used as a safe indicator of the possible presence of pathogenic DNA viruses, suggests that the thawing of permafrost either from global warming or industrial exploitation of circumpolar regions might not be exempt from future threats to human or animal health.


July 7, 2019

Signature gene expression reveals novel clues to the molecular mechanisms of dimorphic transition in Penicillium marneffei.

Systemic dimorphic fungi cause more than one million new infections each year, ranking them among the significant public health challenges currently encountered. Penicillium marneffei is a systemic dimorphic fungus endemic to Southeast Asia. The temperature-dependent dimorphic phase transition between mycelium and yeast is considered crucial for the pathogenicity and transmission of P. marneffei, but the underlying mechanisms are still poorly understood. Here, we re-sequenced P. marneffei strain PM1 using multiple sequencing platforms and assembled the genome using hybrid genome assembly. We determined gene expression levels using RNA sequencing at the mycelial and yeast phases of P. marneffei, as well as during phase transition. We classified 2,718 genes with variable expression across conditions into 14 distinct groups, each marked by a signature expression pattern implicated at a certain stage in the dimorphic life cycle. Genes with the same expression patterns tend to be clustered together on the genome, suggesting orchestrated regulations of the transcriptional activities of neighboring genes. Using qRT-PCR, we validated expression levels of all genes in one of clusters highly expressed during the yeast-to-mycelium transition. These included madsA, a gene encoding MADS-box transcription factor whose gene family is exclusively expanded in P. marneffei. Over-expression of madsA drove P. marneffei to undergo mycelial growth at 37°C, a condition that restricts the wild-type in the yeast phase. Furthermore, analyses of signature expression patterns suggested diverse roles of secreted proteins at different developmental stages and the potential importance of non-coding RNAs in mycelium-to-yeast transition. We also showed that RNA structural transition in response to temperature changes may be related to the control of thermal dimorphism. Together, our findings have revealed multiple molecular mechanisms that may underlie the dimorphic transition in P. marneffei, providing a powerful foundation for identifying molecular targets for mechanism-based interventions.


July 7, 2019

Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi.

Background Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range.ResultsHere, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism.ConclusionsThe genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions.


July 7, 2019

The genomic landscape of the verrucomicrobial methanotroph Methylacidiphilum fumariolicum SolV.

Aerobic methanotrophs can grow in hostile volcanic environments and use methane as their sole source of energy. The discovery of three verrucomicrobial Methylacidiphilum strains has revealed diverse metabolic pathways used by these methanotrophs, including mechanisms through which methane is oxidized. The basis of a complete understanding of these processes and of how these bacteria evolved and are able to thrive in such extreme environments partially resides in the complete characterization of their genome and its architecture.In this study, we present the complete genome sequence of Methylacidiphilum fumariolicum SolV, obtained using Pacific Biosciences single-molecule real-time (SMRT) sequencing technology. The genome assembles to a single 2.5 Mbp chromosome with an average GC content of 41.5%. The genome contains 2,741 annotated genes and 314 functional subsystems including all key metabolic pathways that are associated with Methylacidiphilum strains, including the CBB pathway for CO2 fixation. However, it does not encode the serine cycle and ribulose monophosphate pathways for carbon fixation. Phylogenetic analysis of the particulate methane mono-oxygenase operon separates the Methylacidiphilum strains from other verrucomicrobial methanotrophs. RNA-Seq analysis of cell cultures growing in three different conditions revealed the deregulation of two out of three pmoCAB operons. In addition, genes involved in nitrogen fixation were upregulated in cell cultures growing in nitrogen fixing conditions, indicating the presence of active nitrogenase. Characterization of the global methylation state of M. fumariolicum SolV revealed methylation of adenines and cytosines mainly in the coding regions of the genome. Methylation of adenines was predominantly associated with 5′-m6ACN4GT-3′ and 5′-CCm6AN5CTC-3′ methyltransferase recognition motifs whereas methylated cytosines were not associated with any specific motif.Our findings provide novel insights into the global methylation state of verrucomicrobial methanotroph M. fumariolicum SolV. However, partial conservation of methyltransferases between M. fumariolicum SolV and M. infernorum V4 indicates potential differences in the global methylation state of Methylacidiphilum strains. Unravelling the M. fumariolicum SolV genome and its epigenetic regulation allow for robust characterization of biological processes that are involved in oxidizing methane. In turn, they offer a better understanding of the evolution, the underlying physiological and ecological properties of SolV and other Methylacidiphilum strains.


July 7, 2019

The genome sequence of the Antarctic bullhead notothen reveals evolutionary adaptations to a cold environment.

BackgroundAntarctic fish have adapted to the freezing waters of the Southern Ocean. Representative adaptations to this harsh environment include a constitutive heat shock response and the evolution of an antifreeze protein in the blood. Despite their adaptations to the cold, genome-wide studies have not yet been performed on these fish due to the lack of a sequenced genome. Notothenia coriiceps, the Antarctic bullhead notothen, is an endemic teleost fish with a circumpolar distribution and makes a good model to understand the genomic adaptations to constant sub-zero temperatures.ResultsWe provide the draft genome sequence and annotation for N. coriiceps. Comparative genome-wide analysis with other fish genomes shows that mitochondrial proteins and hemoglobin evolved rapidly. Transcriptome analysis of thermal stress responses find alternative response mechanisms for evolution strategies in a cold environment. Loss of the phosphorylation-dependent sumoylation motif in heat shock factor 1 suggests that the heat shock response evolved into a simple and rapid phosphorylation-independent regulatory mechanism. Rapidly evolved hemoglobin and the induction of a heat shock response in the blood may support the efficient supply of oxygen to cold-adapted mitochondria.ConclusionsOur data and analysis suggest that evolutionary strategies in efficient aerobic cellular respiration are controlled by hemoglobin and mitochondrial proteins, which may be important for the adaptation of Antarctic fish to their environment. The use of genome data from the Antarctic endemic fish provides an invaluable resource providing evidence of evolutionary adaptation and can be applied to other studies of Antarctic fish.


July 7, 2019

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent ‘third-generation’ sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.© 2014 American Society of Plant Biologists. All Rights Reserved.


July 7, 2019

Get your high-quality low-cost genome sequence.

The study of whole-genome sequences has become essential for almost all branches of biological research. Next-generation sequencing (NGS) has revolutionized the scalability, speed, and resolution of sequencing and brought genomic science within reach of academic laboratories that study non-model organisms. Here, we show that a high-quality draft genome of a eukaryote can be obtained at relatively low cost by exploiting a hybrid combination of sequencing strategies. Copyright © 2014 Elsevier Ltd. All rights reserved.


July 7, 2019

Simultaneous sequencing of oxidized methylcytosines produced by TET/JBP dioxygenases in Coprinopsis cinerea.

TET/JBP enzymes oxidize 5-methylpyrimidines in DNA. In mammals, the oxidized methylcytosines (oxi-mCs) function as epigenetic marks and likely intermediates in DNA demethylation. Here we present a method based on diglucosylation of 5-hydroxymethylcytosine (5hmC) to simultaneously map 5hmC, 5-formylcytosine, and 5-carboxylcytosine at near-base-pair resolution. We have used the method to map the distribution of oxi-mC across the genome of Coprinopsis cinerea, a basidiomycete that encodes 47 TET/JBP paralogs in a previously unidentified class of DNA transposons. Like 5-methylcytosine residues from which they are derived, oxi-mC modifications are enriched at centromeres, TET/JBP transposons, and multicopy paralogous genes that are not expressed, but rarely mark genes whose expression changes between two developmental stages. Our study provides evidence for the emergence of an epigenetic regulatory system through recruitment of selfish elements in a eukaryotic lineage, and describes a method to map all three different species of oxi-mCs simultaneously.


July 7, 2019

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

Whole-genome sequences are now available for many microbial species and clades, however, existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.


July 7, 2019

Potential impact on kidney infection: a whole-genome analysis of Leptospira santarosai serovar Shermani.

Leptospira santarosai serovar Shermani is the most frequently encountered serovar, and it causes leptospirosis and tubulointerstitial nephritis in Taiwan. This study aims to complete the genome sequence of L. santarosai serovar Shermani and analyze the transcriptional responses of L. santarosai serovar Shermani to renal tubular cells. To assemble this highly repetitive genome, we combined reads that were generated from four next-generation sequencing platforms by using hybrid assembly approaches to finish two-chromosome contiguous sequences without gaps by validating the data with optical restriction maps and Sanger sequencing. Whole-genome comparison studies revealed a 28-kb region containing genes that encode transposases and hypothetical proteins in L. santarosai serovar Shermani, but this region is absent in other pathogenic Leptospira spp. We found that lipoprotein gene expression in both L. santarosai serovar Shermani and L. interrogans serovar Copenhageni were upregulated upon interaction with renal tubular cells, and LSS19962, a L. santarosai serovar Shermani-specific gene within a 28-kb region that encodes hypothetical proteins, was upregulated in L. santarosai serovar Shermani-infected renal tubular cells. Lipoprotein expression during leptospiral infection might facilitate the interactions of leptospires within kidneys. The availability of the whole-genome sequence of L. santarosai serovar Shermani would make it the first completed sequence of this species, and its comparison with that of other Leptospira spp. may provide invaluable information for further studies in leptospiral pathogenesis.


July 7, 2019

A Y-chromosome-encoded small RNA acts as a sex determinant in persimmons.

In plants, multiple lineages have evolved sex chromosomes independently, providing a powerful comparative framework, but few specific determinants controlling the expression of a specific sex have been identified. We investigated sex determinants in the Caucasian persimmon, Diospyros lotus, a dioecious plant with heterogametic males (XY). Male-specific short nucleotide sequences were used to define a male-determining region. A combination of transcriptomics and evolutionary approaches detected a Y-specific sex-determinant candidate, OGI, that displays male-specific conservation among Diospyros species. OGI encodes a small RNA targeting the autosomal MeGI gene, a homeodomain transcription factor regulating anther fertility in a dosage-dependent fashion. This identification of a feminizing gene suppressed by a Y-chromosome-encoded small RNA contributes to our understanding of the evolution of sex chromosome systems in higher plants. Copyright © 2014, American Association for the Advancement of Science.


July 7, 2019

Comparative genome sequencing reveals genomic signature of extreme desiccation tolerance in the anhydrobiotic midge.

Anhydrobiosis represents an extreme example of tolerance adaptation to water loss, where an organism can survive in an ametabolic state until water returns. Here we report the first comparative analysis examining the genomic background of extreme desiccation tolerance, which is exclusively found in larvae of the only anhydrobiotic insect, Polypedilum vanderplanki. We compare the genomes of P. vanderplanki and a congeneric desiccation-sensitive midge P. nubifer. We determine that the genome of the anhydrobiotic species specifically contains clusters of multi-copy genes with products that act as molecular shields. In addition, the genome possesses several groups of genes with high similarity to known protective proteins. However, these genes are located in distinct paralogous clusters in the genome apart from the classical orthologues of the corresponding genes shared by both chironomids and other insects. The transcripts of these clustered paralogues contribute to a large majority of the mRNA pool in the desiccating larvae and most likely define successful anhydrobiosis. Comparison of expression patterns of orthologues between two chironomid species provides evidence for the existence of desiccation-specific gene expression systems in P. vanderplanki.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.