Menu
July 7, 2019

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.


July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Identification of the fluvirucin B2 (Sch 38518) biosynthetic gene cluster from Actinomadura fulva subsp. indica ATCC 53714: substrate specificity of the ß-amino acid selective adenylating enzyme FlvN.

Fluvirucins are 14-membered macrolactam polyketides that show antifungal and antivirus activities. Fluvirucins have the ß-alanine starter unit at their polyketide skeletons. To understand the construction mechanism of the ß-alanine moiety in fluvirucin biosyntheses, we have identified the biosynthetic cluster of fluvirucin B2 produced from Actinomadura fulva subsp. indica ATCC 53714. The identified gene cluster contains three polyketide synthases, four characteristic ß-amino acid-carrying enzymes, one decarboxylase, and one amidohydrolase. We next investigated the activity of the adenylation enzyme FlvN, which is a key enzyme for the selective incorporation of a ß-amino acid substrate. FlvN showed strong preference for l-aspartate over other amino acids such as ß-alanine. Based on these results, we propose a biosynthetic pathway for fluvirucin B2.


July 7, 2019

TeloPCR-seq: a high-throughput sequencing approach for telomeres.

We have developed a high-throughput sequencing approach that enables us to determine terminal telomere sequences from tens of thousands of individual Schizosaccharomyces pombe telomeres. This method provides unprecedented coverage of telomeric sequence complexity in fission yeast. S. pombe telomeres are composed of modular degenerate repeats that can be explained by variation in usage of the TER1 RNA template during reverse transcription. Taking advantage of this deep sequencing approach, we find that ‘like’ repeat modules are highly correlated within individual telomeres. Moreover, repeat module preference varies with telomere length, suggesting that existing repeats promote the incorporation of like repeats and/or that specific conformations of the telomerase holoenzyme efficiently and/or processively add repeats of like nature. After the loss of telomerase activity, this sequencing and analysis pipeline defines a population of telomeres with altered sequence content. This approach will be adaptable to study telomeric repeats in other organisms and also to interrogate repetitive sequences throughout the genome that are inaccessible to other sequencing methods.© 2016 Federation of European Biochemical Societies.


July 7, 2019

CoLoRMap: Correcting Long Reads by Mapping short reads.

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormapehaghshe@sfu.ca or cedric.chauve@sfu.caSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Epigenetic mechanisms in microbial members of the human microbiota: current knowledge and perspectives.

The human microbiota and epigenetic processes have both been shown to play a crucial role in health and disease. However, there is extremely scarce information on epigenetic modulation of microbiota members except for a few pathogens. Mainly DNA adenine methylation has been described extensively in modulating the virulence of pathogenic bacteria in particular. It would thus appear likely that such mechanisms are widespread for most bacterial members of the microbiota. This review will present briefly the current knowledge on epigenetic processes in bacteria, give examples of known methylation processes in microbial members of the human microbiota and summarize the knowledge on regulation of host epigenetic processes by the human microbiota.


July 7, 2019

Genomics-inspired discovery of three antibacterial active metabolites, aurantinins B, C, and D from compost-associated Bacillus subtilis fmb60.

Fmb60 is a wild-type Bacillus subtilis isolated from compost with significant broad-spectrum antimicrobial activities. Two novel PKS clusters were recognized in the genome sequence of fmb60, and then three polyene antibiotics, aurantinins B, C, and D, 1-3, were obtained by bioactivity-guided isolation from the fermentation of fmb60. The structures of aurantinins B-D were elucidated by LC-HRMS and NMR data analysis. Aurantinins C and D were identified as new antimicrobial compounds. The three aurantinins showed significant activity against multidrug-resistant Staphylococcus aureus and Clostridium sporogenes. However, aurantinins B-D did not exhibit any cytotoxicity (IC50 > 100 µg/mL) against LO2 and Caco2 cell lines by MTT assay. Furthermore, using S. aureus as a model bacterium to explore the antibacterial mechanism of aurantinins B-D, it was revealed that the bactericidal activity of aurantinins B-D was related to their ability to disrupt the cell membrane.


July 7, 2019

DNA extraction protocols for whole-genome sequencing in marine organisms.

The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths’ different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.


July 7, 2019

Serinibacter

The genus Serinibacter belongs, based on the phylogenetic analysis of the nearly full-length 16S rRNA gene, to the Beutenbergiaceae together with the genera Beutenbergia, Salana, and Miniimonas. The two species of the genus Serinibacter shared 99.6% 16S rRNA gene sequence similarity but low DNA DNA relatedness. Cells are irregular rods, Gram-stain positive, not acid-fast. Endospores are not formed. Nonmotile. Aerobic to anaerobic. Oxidase-negative, catalase-positive. The peptidoglycan type is A4a with an l-Ser residue at position 1 of the peptide subunit. The acyl type is acetyl. The major cell-wall sugar is galactose. The predominant menaquinone is MK-8(H4). The major polar lipids consist of phosphatidylglycerol, diphosphatidylglycerol, phosphatidylinositol, and unidentified phospholipids. Phosphatidylethanolamine is absent. The cellular fatty acid profile is dominated by the occurrence of iso- and anteiso-branched-chain acids. Mycolic acids are absent. The genomic G+C content is 70.7 to 72.8 mol%.


July 7, 2019

Complete genome sequence and transcriptome regulation of the pentose utilizing yeast Sugiyamaella lignohabitans.

Efficient conversion of hexoses and pentoses into value-added chemicals represents one core step for establishing economically feasible biorefineries from lignocellulosic material. While extensive research efforts have recently provided advances in the overall process performance, the quest for new microbial cell factories and novel enzymes sources is still open. As demonstrated recently the yeast Sugiyamaella lignohabitans (formerly Candida lignohabitans) represents a promising microbial cell factory for the production of organic acids from lignocellulosic hydrolysates. We report here the de novo genome assembly of S. lignohabitans using the Single Molecule Real-Time platform, with gene prediction refined by using RNA-seq. The sequencing revealed a 15.98 Mb genome, subdivided into four chromosomes. By phylogenetic analysis, Blastobotrys (Arxula) adeninivorans and Yarrowia lipolytica were found to be close relatives of S. lignohabitans Differential gene expression was evaluated in typical growth conditions on glucose and xylose and allowed a first insight into the transcriptional response of S. lignohabitans to different carbon sources and different oxygenation conditions. Novel sequences for enzymes and transporters involved in the central carbon metabolism, and therefore of potential biotechnological interest, were identified. These data open the way for a better understanding of the metabolism of S. lignohabitans and provide resources for further metabolic engineering.© FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Efficient, cost-effective, high-throughput, Multilocus Sequencing Typing (MLST) method, NGMLST, and the analytical software program MLSTEZ.

Multilocus sequence typing (MLST) has become the preferred method for genotyping many biological species. It can be used to identify major phylogenetic clades, molecular groups, or subpopulations of a species, as well as individual strains or clones. However, conventional MLST is costly and time consuming, which limits its power for genotyping large numbers of samples. Here, we describe a new MLST method that uses next-generation sequencing, a multiplexing protocol, and appropriate analytical software to provide accurate, rapid, and economical MLST genotyping of 96 or more isolates in a single assay.


July 7, 2019

Transfer of the potato plant isolates of Pectobacterium wasabiae to Pectobacterium parmentieri sp. nov.

Pectobacterium wasabiae was originally isolated from Japanese horseradish (Eutrema wasabi), but recently some Pectobacterium isolates collected from potato plants and tubers displaying blackleg and soft rot symptoms were also assigned to P. wasabiae. Here, combining genomic and phenotypical data, we re-evaluated their taxonomic position. PacBio and Illumina technologies were used to complete the genome sequences of P. wasabiae CFBP 3304T and RNS 08-42-1A. Multi-locus sequence analysis showed that the P. wasabiae strains RNS 08-42-1A, SCC3193, CFIA1002 and WPP163, which were collected from potato plant environment, constituted a separate clade from the original Japanese horseradish P. wasabiae. The taxonomic position of these strains was also supported by calculation of the in-silico DNA-DNA hybridization, genome average nucleotide indentity, alignment fraction and average nucleotide indentity values. In addition, they were phenotypically distinguished from P. wasabiae strains by producing acids from (+)-raffinose, a-d(+)-a-lactose, d(+)-galactose and (+)-melibiose but not from methyl a-d-glycopyranoside, (+)-maltose or malonic acid. The name Pectobacterium parmentieri sp. nov. is proposed for this taxon; the type strain is RNS 08-42-1AT (=CFBP 8475T=LMG 29774T).


July 7, 2019

Susan Celniker: Foundational resources to study a dynamic genome.

The Genetics Society of America’s George W. Beadle Award honors individuals who have made outstanding contributions to the community of genetics researchers and who exemplify the qualities of its namesake. The 2016 recipient, Susan E. Celniker, played a key role in the sequencing, annotation, and characterization of the Drosophila genome. She participated in early sequencing efforts at the Lawrence Berkeley National Laboratory and led the modENCODE Fly Transcriptome Consortium. Her efforts were critical to ensuring that the Drosophila genome was well-annotated, making it one of the best curated animal genomes available. As the Principal Investigator for the BDGP, Celniker has enabled the study of proteomes by creating a collection of over 13,000 clones that match annotated genes for protein expression in cells or transgenic flies, and she has established the most comprehensive spatial gene expression atlas in any organism, with in situ imaging of more than 80% of the Drosophila protein-coding transcriptome through embryogenesis. In addition to providing the research community with these invaluable resources and reagents, she continues to develop new tools and datasets for genetics researchers to explore the spatial and temporal control of gene expression.


July 7, 2019

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Lactobacillus plantarum is one of the widely-used probiotics and there have been a large number of advanced researches on the effectiveness of this species. However, the difference between previously reported plantarum strains, and the source of genomic variation among the strains were not clearly specified. In order to understand further on the molecular basis of L. plantarum on Korean traditional fermentation, we isolated the L. plantarum GB-LP4 from Korean fermented vegetable and conducted whole genome assembly. With comparative genomics approach, we identified the candidate genes that are expected to have undergone evolutionary acceleration. These genes have been reported to associate with the maintaining homeostasis, which are generally known to overcome instability in external environment including low pH or high osmotic pressure. Here, our results provide an evolutionary relationship between L. plantarum species and elucidate the candidate genes that play a pivotal role in evolutionary acceleration of GB-LP4 in high osmolarity environment. This study may provide guidance for further studies on L. plantarum.


July 7, 2019

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Completion of eukaryal genomes can be difficult task with the highly repetitive sequences along the chromosomes and short read lengths of second-generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model organism and a cell factory, was selected for this study to demonstrate the superior capability of very long sequence reads for de novo genome assembly. We generated long reads using two common third-generation sequencing technologies (Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)) and used short reads obtained using Illumina sequencing for error correction. Assembly of the reads derived from all three technologies resulted in complete sequences for all 16 yeast chromosomes, as well as the mitochondrial chromosome, in one step. Further, we identified three types of DNA methylation (5mC, 4mC and 6mA). Comparison between the reference strain S288C and strain CEN.PK113-7D identified chromosomal rearrangements against a background of similar gene content between the two strains. We identified full-length transcripts through ONT direct RNA sequencing technology. This allows for the identification of transcriptional landscapes, including untranslated regions (UTRs) (5′ UTR and 3′ UTR) as well as differential gene expression quantification. About 91% of the predicted transcripts could be consistently detected across biological replicates grown either on glucose or ethanol. Direct RNA sequencing identified many polyadenylated non-coding RNAs, rRNAs, telomere-RNA, long non-coding RNA and antisense RNA. This work demonstrates a strategy to obtain complete genome sequences and transcriptional landscapes that can be applied to other eukaryal organisms.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.