Genome assembly Archives - Page 151 of 196

July 7, 2019

Copy number variation and expression analysis reveals a nonorthologous pinta gene family member involved in butterfly vision.

Vertebrate (cellular retinaldehyde-binding protein) and Drosophila (prolonged depolarization afterpotential is not apparent [PINTA]) proteins with a CRAL-TRIO domain transport retinal-based chromophores that bind to opsin proteins and are necessary for phototransduction. The CRAL-TRIO domain gene family is composed of genes that encode proteins with a common N-terminal structural domain. Although there is an expansion of this gene family in Lepidoptera, there is no lepidopteran ortholog of pinta. Further, the function of these genes in lepidopterans has not yet been established. Here, we explored the molecular evolution and expression of CRAL-TRIO domain genes in the butterfly Heliconius melpomene in order to identify a member of this gene family as a candidate chromophore transporter. We generated and searched a four tissue transcriptome and searched a reference genome for CRAL-TRIO domain genes. We expanded an insect CRAL-TRIO domain gene phylogeny to include H. melpomene and used 18 genomes from 4 subspecies to assess copy number variation. A transcriptome-wide differential expression analysis comparing four tissue types identified a CRAL-TRIO domain gene, Hme CTD31, upregulated in heads suggesting a potential role in vision for this CRAL-TRIO domain gene. RT-PCR and immunohistochemistry confirmed that Hme CTD31 and its protein product are expressed in the retina, specifically in primary and secondary pigment cells and in tracheal cells. Sequencing of eye protein extracts that fluoresce in the ultraviolet identified Hme CTD31 as a possible chromophore binding protein. Although we found several recent duplications and numerous copy number variants in CRAL-TRIO domain genes, we identified a single copy pinta paralog that likely binds the chromophore in butterflies.© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

Genome-wide epigenetic studies in chicken: A review

Over the years, farmed birds have been selected on various performance traits mainly through genetic selection. However, many studies have shown that genetics may not be the sole contributor to phenotypic plasticity. Gene expression programs can be influenced by environmentally induced epigenetic changes that may alter the phenotypes of the developing animals. Recently, high-throughput sequencing techniques became sufficiently affordable thanks to technological advances to study whole epigenetic landscapes in model plants and animals. In birds, a growing number of studies recently took advantage of these techniques to gain insights into the epigenetic mechanisms of gene regulation in processes such as immunity or environmental adaptation. Here, we review the current gain of knowledge on the chicken epigenome made possible by recent advances in high-throughput sequencing techniques by focusing on the two most studied epigenetic modifications, DNA methylation and histone post-translational modifications. We discuss and provide insights about designing and performing analyses to further explore avian epigenomes. A better understanding of the molecular mechanisms underlying the epigenetic regulation of gene expression in relation to bird phenotypes may provide new knowledge and markers that should undoubtedly contribute to a sustainable poultry production.

July 7, 2019

Evolutionary context of non-sorbitol-fermenting Shiga toxin-producing Escherichia coli O55:H7.

In July 2014, an outbreak of Shiga toxin-producing Escherichia coli (STEC) O55:H7 in England involved 31 patients, 13 (42%) of whom had hemolytic uremic syndrome. Isolates were sequenced, and the sequences were compared with publicly available sequences of E. coli O55:H7 and O157:H7. A core-genome phylogeny of the evolutionary history of the STEC O55:H7 outbreak strain revealed that the most parsimonious model was a progenitor enteropathogenic O55:H7 sorbitol-fermenting strain, lysogenized by a Shiga toxin (Stx) 2a-encoding phage, followed by loss of the ability to ferment sorbitol because of a non-sense mutation in srlA. The parallel, convergent evolutionary histories of STEC O157:H7 and STEC O55:H7 may indicate a common driver in the evolutionary process. Because emergence of STEC O157:H7 as a clinically significant pathogen was associated with acquisition of the Stx2a-encoding phage, the emergence of STEC O55:H7 harboring the stx2a gene is of public health concern.

July 7, 2019

An update on bioinformatics resources for plant genomics research

Next-generation sequencing and traditional Sanger sequencing methods are of great significance in unraveling the complexity of plant genomes. These are constantly generating heaps of sequence data to be analyzed, annotated and stored. This has created a revolutionary demand for bioinformatics tools and software that can perform these functions. A large number of potentially useful bioinformatics tools and plant genome databases are created that have greatly simplified the analysis and storage of vast amounts of sequence data. The information garnered using the available bioinformatics methods have greatly helped in understanding the plant genome structure. Despite the availability of a good number of such tools, the information pouring from single gene-sequencing, and various whole-genome sequencing projects is overwhelming; thus, further innovations and improved methods are needed to sift through this sequence data, and assemble genomes. The current review focuses on diverse bioinformatics approaches and methods developed to systematically analyze and store plant sequence data. Finally, it outlines the bottlenecks in plant genome analysis, and some possible solutions that could be utilized to overcome the problems associated with plant genome analysis.

July 7, 2019

Genome sequencing brought Gossypium biology research into a new era.

The first sequenced diploid cotton genome was published in 2012 by the group led by the Institute of Cotton Research, Chinese Academy of Agricultural Sciences. Cotton genomics research subsequently entered a period of rapid development. The accumulating data have provided new insights into the evolution and domestication of cotton, the development of important agronomic traits, and strategies for improving cotton quality and production.

July 7, 2019

Bi-level error correction for PacBio long reads.

The latest sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines can generate long reads at the length of thousands of nucleic bases which is much longer than the reads at the length of hundreds generated by Illumina machines. However, these long reads are prone to much higher error rates, for example 15%, making downstream analysis and applications very difficult. Error correction is a process to improve the quality of sequencing data. Hybrid correction strategies have been recently proposed to combine Illumina reads of low error rates to fix sequencing errors in the noisy long reads with good performance. In this paper, we propose a new method named Bicolor, a bi-level framework of hybrid error correction for further improving the quality of PacBio long reads. At the first level, our method uses a de Bruijn graph-based error correction idea to search paths in pairs of solid -mers iteratively with an increasing length of -mer. At the second level, we combine the processed results under different parameters from the first level. In particular, a multiple sequence alignment algorithm is used to align those similar long reads, followed by a voting algorithm which determines the final base at each position of the reads. We compare the superior performance of Bicolor with three state-of-the-art methods on three real data sets. Results demonstrate that Bicolor always achieves the highest identity ratio. Bicolor also achieves a higher alignment ratio () and a higher number of aligned reads than the current methods on two data sets. On the third data set, our method is closely competitive to the current methods in terms of number of aligned reads and genome coverage. The C++ source codes of our algorithm are freely available at https://github.com/yuansliu/Bicolor.

July 7, 2019

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data.

To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84?% of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained fromde novoassembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (<10?kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (>50?kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.

July 7, 2019

Genomic characterization of a local epidemic Pseudomonas aeruginosa reveals specific features of the widespread clone ST395.

Pseudomonas aeruginosa is a ubiquitous opportunistic pathogen with several clones being frequently associated with outbreaks in hospital settings. ST395 is among these so-called ‘international’ clones. We aimed here to define the biological features that could have helped the implantation and spread of the clone ST395 in hospital settings. The complete genome of a multidrug resistant index isolate (DHS01) of a large hospital outbreak was analysed. We identified DHS01-specific genetic elements, among which were identified those shared with a panel of six independent ST395 isolates responsible for outbreaks in other hospitals. DHS01 has the fifth largest chromosome of the species (7.1 Mbp), with most of its 1555 accessory genes borne by either genomic islands (GIs,n=48) or integrative and conjugative elements (ICEs,n=5). DHS01 is multidrug resistant mostly due to chromosomal mutations. It displayed signatures of adaptation to chronic infection in part due to the loss of a 131 kbp chromosomal fragment. Four GIs were specific to the clone ST395 and contained genes involved in metabolism (GI-4), in virulence (GI-6) and in resistance to copper (GI-7). GI-7 harboured an array of six copper transporters and was shared with non-pathogenicPseudomonassp. retrieved from copper-contaminated environments. Copper resistance was confirmed phenotypically in all other ST395 isolates and possibly accounted for the spreading capability of the clone in hospital outbreaks, where water networks have been incriminated. This suggests that genes transferred from copper-polluted environments may have favoured the implantation and spread of the international cloneP. aeruginosaST395 in hospital settings.

July 7, 2019

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.© The Author 2017. Published by Oxford University Press.

July 7, 2019

Complete genome sequence of the new urolithin-producing bacterium Gordonibacter urolithinfaciens DSM 27213T.

Gordonibacter urolithinfaciens DSM 27213T was isolated from human feces and is able to metabolize ellagic acid (a dietary phenolic compound present in various fruits) to urolithins. Here, we report the finished and annotated genome sequence of this organism.

July 7, 2019

The completed PacBio single-molecule real-time sequence of Methylosinus trichosporium strain OB3b reveals the presence of a third large plasmid.

Presented here is the complete genome sequence of the well-studiedRhizobialesmethanotrophMethylosinus trichosporiumstrain OB3b. The assembly contains 5,183,433 bp, corresponding to a chromosome of 4,508,832 bp and three circular plasmids of 285,280 bp, 209,102 bp, and 180,219 bp. Copyright © 2017 Heil et al.

July 7, 2019

COSINE: non-seeding method for mapping long noisy sequences.

Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

Comparative whole-genomic analysis of an ancient L2 lineage Mycobacterium novel phylogenetic clade and common genetic determinants of hypervirulent strains.

Background: Development of improved therapeutics against tuberculosis (TB) is hindered by an inadequate understanding of the relationship between disease severity and genetic diversity of its causative agent, Mycobacterium tuberculosis. We previously isolated a hypervirulent M. tuberculosis strain H112 from an HIV-negative patient with an aggressive disease progression from pulmonary TB to tuberculous meningitis—the most severe manifestation of tuberculosis. Human macrophage challenge experiment demonstrated that the strain H112 exhibited significantly better intracellular survivability and induced lower level of TNF-a than the reference virulent strain H37Rv and other 123 clinical isolates. Aim: The present study aimed to identify the potential genetic determinants of mycobacterial virulence that were common to strain H112 and hypervirulent M. tuberculosis strains of the same phylogenetic clade isolated in other global regions. Methods: A low-virulent M. tuberculosis strain H54 which belonged to the same phylogenetic lineage (L2) as strain H112 was selected from a collection of 115 clinical isolates. Both H112 and H54 were whole-genome-sequenced using PacBio sequencing technology. A comparative genomics approach was adopted to identify mutations present in strain H112 but absent in strain H54. Subsequently, an extensive phylogenetic analysis was conducted by including all publically available M. tuberculosis genomes. Single-nucleotide-polymorphisms (SNPs) and structural variations (SVs) common to hypervirulent strains in the global collection of genomes were considered as potential genetic determinants of hypervirulence. Results: Sequencing data revealed that both H112 and H54 were identified as members of the same sub-lineage L2.2.1. After excluding the lineage-related mutations shared between H112 and H54, we analyzed the phylogenetic relatedness of H112 with global collection of M. tuberculosis genomes (n = 4,338), and identified a novel phylogenetic clade in which four hypervirulent strains isolated from geographically diverse regions were clustered together. All hypervirulent strains in the clade shared 12 SNPs and 5 SVs with H112, including those affecting key virulence-associated loci, notably, a deleterious SNP (rv0178 p. D150E) within mce1 operon and an intergenic deletion (854259_ 854261delCC) in close-proximity to phoP. Conclusion: The present study identified common genetic factors in a novel phylogenetic clade of hypervirulent M. tuberculosis. The causative role of these mutations in mycobacterial virulence should be validated in future study.

July 7, 2019

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read-based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.© The Authors 2017. Published by Oxford University Press.

July 7, 2019

Nitrogen fixation genes and nitrogenase activity of the non-heterocystous cyanobacterium Thermoleptolyngbya sp. O-77.

Cyanobacteria are widely distributed in marine, aquatic, and terrestrial ecosystems, and play an important role in the global nitrogen cycle. In the present study, we examined the genome sequence of the thermophilic non-heterocystous N2-fixing cyanobacterium, Thermoleptolyngbya sp. O-77 (formerly known as Leptolyngbya sp. O-77) and characterized its nitrogenase activity. The genome of this cyanobacterial strain O-77 consists of a single chromosome containing a nitrogen fixation gene cluster. A phylogenetic analysis indicated that the NifH amino acid sequence from strain O-77 was clustered with those from a group of mesophilic species: the highest identity was found in Leptolyngbya sp. KIOST-1 (97.9% sequence identity). The nitrogenase activity of O-77 cells was dependent on illumination, whereas a high intensity of light of 40 µmol m-2 s-1 suppressed the effects of illumination.

Auto Tag: Genome assembly

Copy number variation and expression analysis reveals a nonorthologous pinta gene family member involved in butterfly vision.

Genome-wide epigenetic studies in chicken: A review

Evolutionary context of non-sorbitol-fermenting Shiga toxin-producing Escherichia coli O55:H7.

An update on bioinformatics resources for plant genomics research

Genome sequencing brought Gossypium biology research into a new era.

Bi-level error correction for PacBio long reads.

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data.

Genomic characterization of a local epidemic Pseudomonas aeruginosa reveals specific features of the widespread clone ST395.

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

Complete genome sequence of the new urolithin-producing bacterium Gordonibacter urolithinfaciens DSM 27213T.

The completed PacBio single-molecule real-time sequence of Methylosinus trichosporium strain OB3b reveals the presence of a third large plasmid.

COSINE: non-seeding method for mapping long noisy sequences.

Comparative whole-genomic analysis of an ancient L2 lineage Mycobacterium novel phylogenetic clade and common genetic determinants of hypervirulent strains.

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Nitrogen fixation genes and nitrogenase activity of the non-heterocystous cyanobacterium Thermoleptolyngbya sp. O-77.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert