multi Archives - Page 66 of 71

July 7, 2019

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Lactobacillus plantarum is one of the widely-used probiotics and there have been a large number of advanced researches on the effectiveness of this species. However, the difference between previously reported plantarum strains, and the source of genomic variation among the strains were not clearly specified. In order to understand further on the molecular basis of L. plantarum on Korean traditional fermentation, we isolated the L. plantarum GB-LP4 from Korean fermented vegetable and conducted whole genome assembly. With comparative genomics approach, we identified the candidate genes that are expected to have undergone evolutionary acceleration. These genes have been reported to associate with the maintaining homeostasis, which are generally known to overcome instability in external environment including low pH or high osmotic pressure. Here, our results provide an evolutionary relationship between L. plantarum species and elucidate the candidate genes that play a pivotal role in evolutionary acceleration of GB-LP4 in high osmolarity environment. This study may provide guidance for further studies on L. plantarum.

July 7, 2019

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Completion of eukaryal genomes can be difficult task with the highly repetitive sequences along the chromosomes and short read lengths of second-generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model organism and a cell factory, was selected for this study to demonstrate the superior capability of very long sequence reads for de novo genome assembly. We generated long reads using two common third-generation sequencing technologies (Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)) and used short reads obtained using Illumina sequencing for error correction. Assembly of the reads derived from all three technologies resulted in complete sequences for all 16 yeast chromosomes, as well as the mitochondrial chromosome, in one step. Further, we identified three types of DNA methylation (5mC, 4mC and 6mA). Comparison between the reference strain S288C and strain CEN.PK113-7D identified chromosomal rearrangements against a background of similar gene content between the two strains. We identified full-length transcripts through ONT direct RNA sequencing technology. This allows for the identification of transcriptional landscapes, including untranslated regions (UTRs) (5′ UTR and 3′ UTR) as well as differential gene expression quantification. About 91% of the predicted transcripts could be consistently detected across biological replicates grown either on glucose or ethanol. Direct RNA sequencing identified many polyadenylated non-coding RNAs, rRNAs, telomere-RNA, long non-coding RNA and antisense RNA. This work demonstrates a strategy to obtain complete genome sequences and transcriptional landscapes that can be applied to other eukaryal organisms.

July 7, 2019

Complete genome sequence of the highly Mn(II) tolerant Staphylococcus sp. AntiMn-1 isolated from deep-sea sediment in the Clarion-Clipperton Zone.

Staphylococcus sp. AntiMn-1 is a deep-sea bacterium inhabiting seafloor sediment in the Clarion-Clipperton Zone (CCZ) that is highly tolerant to Mn(II) and displays efficient Mn(II) oxidation. Herein, we present the assembly and annotation of its genome. Copyright © 2017 Elsevier B.V. All rights reserved.

July 7, 2019

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.

July 7, 2019

Recent progress and prospects for advancing arachnid genomics

Arachnids exhibit tremendous species richness and adaptations of biomedical, industrial, and agricultural importance. Yet genomic resources for arachnids are limited, with the first few spider and scorpion genomes becoming accessible in the last four years. We review key insights from these genome projects, and recommend additional genomes for sequencing, emphasizing taxa of greatest value to the scientific community. We suggest greater sampling of spiders whose genomes are understudied but hold important protein recipes for silk and venom production. We further recommend arachnid genomes to address significant evolutionary topics, including the phenotypic impact of genome duplications. A barrier to high-quality arachnid genomes are assemblies based solely on short-read data, which may be overcome by long-range sequencing and other emerging methods.

July 7, 2019

Inferring synteny between genome assemblies: a systematic evaluation.

Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis.We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous.Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.

July 7, 2019

Complete genome sequences of four toxigenic Clostridium difficile clinical isolates from patients of the lower Hudson Valley, New York, USA.

Complete genome sequences of four toxigenicClostridium difficileisolates from patients in the lower Hudson Valley, New York, USA, were achieved. These isolates represent four common sequence types (ST1, ST2, ST8, and ST42) belonging to two distinct phylogenetic clades. All isolates have a 4.0- to 4.2-Mb circular chromosome, and one carries a phage. Copyright © 2018 Yin et al.

July 7, 2019

Complete genome sequence of Elizabethkingia miricola strain EM798-26 isolated from the blood of a cancer patient.

Elizabethkingia miricola EM798-26 was isolated from the blood of a patient with diffuse large B-cell lymphoma in Taiwan. We report here the complete genome sequence of EM798-26, which contains a G+C content of 35.7% and 3,877 candidate protein-coding genes. Copyright © 2018 Lin et al.

July 7, 2019

An empirical evaluation of error correction methods and tools for next generation sequencing data

esearch. However, data produced by NGS is affected by different errors such as substitutions, deletions or insertion. It is essential to differentiate between true biological variants and alterations occurred due to errors for accurate downstream analysis. Many types of methods and tools have been developed for NGS error correction. Some of these methods only correct substitutions errors whereas others correct multi types of data errors. In this article, a comprehensive evaluation of three types of methods (k-spectrum based, Multi- sequencing alignment and Hybrid based) is presented which are implemented and adopted by different tools. Experiments have been conducted to compare the performance based on runtime and error correction rate. Two different computing platforms have been used for the experiments to evaluate effectiveness of runtime and error correction rate. The mission and aim of this comparative evaluation is to provide recommendations for selection of suitable tools to cope with the specific needs of users and practitioners. It has been noticed that k-mer spectrum based methodology generated superior results as compared to other methods. Amongst all the tools being utilized, Racer has shown eminent performance in terms of error correction rate and execution time for both small as well as large data sets. In multisequence alignment based tools, Karect depicts excellent error correction rate whereas Coral shows better execution time for all data sets. In hybrid based tools, Jabba shows better error correction rate and execution time as compared to brownie. Computing platforms mostly affect execution time but have no general effect on error correction rate.

July 7, 2019

FMLRC: Hybrid long read error correction using an FM-index.

Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy.We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods.Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

July 7, 2019

Oryza glaberrima Steud.

Oryza glaberrima is the African cultivated rice species, domesticated from its wild ancestor by farmers living in Inland Delta of Niger River. Several studies indicated that it has extremely narrow genetic diversity compared to both its wild progenitor, Oryza barthii and the Asian rice, Oryza sativa which can mainly be attributed to a severe domestication bottleneck. Despite its scarcity in farmer’s field due to its low yield potential, high shattering and lodging susceptibility, O. glaberrima is of great value not only to Africa but also globally. Perhaps its greatest contribution to regional and global food security is as a source of genes, as it possesses resistance/tolerance to various biotic and abiotic stresses. It also has unique starch-related traits which give it good cooking and eating properties. Advances in DNA sequencing have provided useful genomic resources for African rice, key among them being whole genome sequences. Genomic tools are enabling greater understanding of the useful functional diversity found in this species. These advances have potential of addressing some of the undesirable attributes found in this species which have led to its continued replacement by Asian rice. Development of new generation of rice varieties for African farmers will therefore require the adoption of advanced molecular breeding tools as these will allow efficient utilization of the wealth and resilience found in African rice in rice improvement.

July 7, 2019

Paucibacter aquatile sp. nov. isolated from freshwater of the Nakdong River, Republic of Korea.

A Gram-negative, aerobic, motile, and rod-shaped bacterial strain designated CR182T was isolated from freshwater of the Nakdong River, Republic of Korea. Optimal growth conditions for this novel strain were found to be: 25-30 °C, pH 6.5-8.5, and 3% (w/v) NaCl. Phylogenetic analysis based on 16S rRNA gene sequence indicates that the strain CR182T belongs to type strains of genus Paucibacter. Strain CR182T showed 98.0% 16S rRNA gene sequence similarity with Paucibacter oligotrophus CHU3T and formed a robust phylogenetic clade with this species. The average nucleotide identity value between strain CR182T and P. oligotrophus CHU3T was 78.4% and the genome-to-genome distance was 22.2% on average. The genomic DNA G+C content calculated from the genome sequence was 66.3 mol%. Predominant cellular fatty acids of strain CR182T were summed feature 3 (C16:1 ?7c and/or C16:1 ?6c) (31.2%) and C16:0 (16.0%). Its major respiratory quinine was ubiquinone Q-8. Its polar lipids consisted of diphosphatidylglycerol, phosphatidylethanolamine, and two unidentified phospholipids. Its genomic DNA G+C content was 66.3%. Based on data obtained from this polyphasic taxonomic study, strain CR182T represents a novel species belonging to genus Paucibacter, for which a name of P. aquatile sp. nov. is proposed. The type strain is CR182T (=?KCCM 90284T?=?NBRC 113032T).

July 7, 2019

The ‘gifted’ actinomycete Streptomyces leeuwenhoekii.

Streptomyces leeuwenhoekii strains C34T, C38, C58 and C79 were isolated from a soil sample collected from the Chaxa Lagoon, located in the Salar de Atacama in northern Chile. These streptomycetes produce a variety of new specialised metabolites with antibiotic, anti-cancer and anti-inflammatory activities. Moreover, genome mining performed on two of these strains has revealed the presence of biosynthetic gene clusters with the potential to produce new specialised metabolites. This review focusses on this new clade of Streptomyces strains, summarises the literature and presents new information on strain C34T.

July 7, 2019

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.

July 7, 2019

Smooth q-Gram, and its applications to detection of overlaps among long, error-prone sequencing reads

We propose smoothq-gram, the frst variant of q-gram that captures q-gram pair within a small edit distance. We apply smooth q-gram to the problem of detecting overlapping pairs of error-prone reads produced by single molecule real time sequencing (SMRT), which is the frst and most critical step of the de novo fragment assembly of SMRT reads. We have implemented and tested our algorithm on a set of real world benchmarks. Our empirical results demonstrated the signifcant superiority of our algorithm over the existing q-gram based algorithms in accuracy.

Auto Tag: multi

Comparative genomic analysis of Lactobacillus plantarum GB-LP4 and identification of evolutionarily divergent genes in high-osmolarity environment.

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Complete genome sequence of the highly Mn(II) tolerant Staphylococcus sp. AntiMn-1 isolated from deep-sea sediment in the Clarion-Clipperton Zone.

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Recent progress and prospects for advancing arachnid genomics

Inferring synteny between genome assemblies: a systematic evaluation.

Complete genome sequences of four toxigenic Clostridium difficile clinical isolates from patients of the lower Hudson Valley, New York, USA.

Complete genome sequence of Elizabethkingia miricola strain EM798-26 isolated from the blood of a cancer patient.

An empirical evaluation of error correction methods and tools for next generation sequencing data

FMLRC: Hybrid long read error correction using an FM-index.

Oryza glaberrima Steud.

Paucibacter aquatile sp. nov. isolated from freshwater of the Nakdong River, Republic of Korea.

The ‘gifted’ actinomycete Streptomyces leeuwenhoekii.

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Smooth q-Gram, and its applications to detection of overlaps among long, error-prone sequencing reads

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert