Bioinformatics Archives - Page 230 of 267

July 7, 2019

Assembly of long error-prone reads using de Bruijn graphs.

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.

July 7, 2019

Selecting reads for haplotype assembly

Haplotype assembly or read-based phasing is the problem of reconstructing both haplotypes of a diploid genome from next-generation sequencing data. This problem is formalized as the Minimum Error Correction (MEC) problem and can be solved using algorithms such as WhatsHap. The runtime of WhatsHap is exponential in the maximum coverage, which is hence controlled in a pre-processing step that selects reads to be used for phasing. Here, we report on a heuristic algorithm designed to choose beneficial reads for phasing, in particular to increase the connectivity of the phased blocks and the number of correctly phased variants compared to the random selection previously employed in by WhatsHap. The algorithm we describe has been integrated into the WhatsHap software, which is available under MIT licence from https://bitbucket.org/whatshap/whatshap.

July 7, 2019

Genome sequence of the multiantibiotic-resistant Enterococcus faecium strain C68 and insights on the pLRM23 colonization plasmid.

Enterococcus faecium infections are a rising concern in hospital settings. Vancomycin-resistant enterococci colonize the gastrointestinal tract and replace nonresistant strains, complicating the treatment of debilitated patients. Here, we present a polished genome of the multiantibiotic-resistant strain C68, which was obtained as a clinical isolate and is a useful experimental strain. Copyright © 2016 García-Solache and Rice.

July 7, 2019

Complete genome sequence of Mycoplasma mycoides subsp. mycoides T1/44, a vaccine strain against contagious bovine pleuropneumonia.

Mycoplasma mycoidessubsp.mycoidesis the etiologic agent of contagious bovine pleuropneumonia. We report here the complete genome sequence of the strain T1/44, which is widely used as a live vaccine in Africa. Copyright © 2016 Gourgues et al.

July 7, 2019

Complete genome sequence of Enterococcus faecium ATCC 700221.

We report the complete genome sequence of a vancomycin-resistant isolate of Enterococcus faecium derived from human feces. The genome comprises one chromosome of 2.9 Mb and three plasmids. The strain harbors a plasmid-borne vanA-type vancomycin resistance locus and is a member of multilocus sequencing type (MLST) cluster ST-17. Copyright © 2016 McKenney et al.

July 7, 2019

High-quality draft genomes from Thermus caliditerrae YIM 77777 and T. tengchongensis YIM 77401, isolates from Tengchong, China.

The draft genomes of Thermus tengchongensis YIM 77401 and T. caliditerrae YIM 77777 are 2,562,314 and 2,218,114 bp and encode 2,726 and 2,305 predicted genes, respectively. Gene content and growth experiments demonstrate broad metabolic capacity, including starch hydrolysis, thiosulfate oxidation, arsenite oxidation, incomplete denitrification, and polysulfide reduction. Copyright © 2016 Mefferd et al.

July 7, 2019

Genome sequence of Madurella mycetomatis mm55, isolated from a human mycetoma case in Sudan.

We present the first genome sequence for a strain of the main mycetoma causative agent, Madurella mycetomatis This 36.7-Mb genome sequence will offer new insights into the pathogenesis of mycetoma, and it will contribute to the development of better therapies for this neglected tropical disease. Copyright © 2016 Smit et al.

July 7, 2019

Microevolution of monophasic Salmonella Typhimurium during epidemic, United Kingdom, 2005-2010.

Microevolution associated with emergence and expansion of new epidemic clones of bacterial pathogens holds the key to epidemiologic success. To determine microevolution associated with monophasic Salmonella Typhimurium during an epidemic, we performed comparative whole-genome sequencing and phylogenomic analysis of isolates from the United Kingdom and Italy during 2005-2012. These isolates formed a single clade distinct from recent monophasic epidemic clones previously described from North America and Spain. The UK monophasic epidemic clones showed a novel genomic island encoding resistance to heavy metals and a composite transposon encoding antimicrobial drug resistance genes not present in other Salmonella Typhimurium isolates, which may have contributed to epidemiologic success. A remarkable amount of genotypic variation accumulated during clonal expansion that occurred during the epidemic, including multiple independent acquisitions of a novel prophage carrying the sopE gene and multiple deletion events affecting the phase II flagellin locus. This high level of microevolution may affect antigenicity, pathogenicity, and transmission.

July 7, 2019

Complete genome sequence of the Mycobacterium immunogenum type strain CCUG 47286.

Here, we report the complete genome sequence of Mycobacterium immunogenum type strain CCUG 47286, a nontuberculous mycobacterium. The whole genome has 5,573,781 bp and covers as many as 5,484 predicted genes. This genome contributes to the task of closing the still-existing gap of genomes of rapidly growing mycobacterial type strains. Copyright © 2016 Jaén-Luchoro et al.

July 7, 2019

The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts.

Catfish represent 12% of teleost or 6.3% of all vertebrate species, and are of enormous economic value. Here we report a high-quality reference genome sequence of channel catfish (Ictalurus punctatus), the major aquaculture species in the US. The reference genome sequence was validated by genetic mapping of 54,000 SNPs, and annotated with 26,661 predicted protein-coding genes. Through comparative analysis of genomes and transcriptomes of scaled and scaleless fish and scale regeneration experiments, we address the genomic basis for the most striking physical characteristic of catfish, the evolutionary loss of scales and provide evidence that lack of secretory calcium-binding phosphoproteins accounts for the evolutionary loss of scales in catfish. The channel catfish reference genome sequence, along with two additional genome sequences and transcriptomes of scaled catfishes, provide crucial resources for evolutionary and biological studies. This work also demonstrates the power of comparative subtraction of candidate genes for traits of structural significance.

July 7, 2019

Direct repeat-mediated DNA deletion of the mating type MAT1-2 genes results in unidirectional mating type switching in Sclerotinia trifoliorum.

The necrotrophic fungal pathogen Sclerotinia trifoliorum exhibits ascospore dimorphism and unidirectional mating type switching – self-fertile strains derived from large ascospores produce both self-fertile (large-spores) and self-sterile (small-spores) offsprings in a 4:4 ratio. The present study, comparing DNA sequences at MAT locus of both self-fertile and self-sterile strains, found four mating type genes (MAT1-1-1, MAT1-1-5, MAT1-2-1 and MAT1-2-4) in the self-fertile strain. However, a 2891-bp region including the entire MAT1-2-1 and MAT1-2-4 genes had been completely deleted from the MAT locus in the self-sterile strain. Meanwhile, two copies of a 146-bp direct repeat motif flanking the deleted region were found in the self-fertile strain, but only one copy of this 146-bp motif (a part of the MAT1-1-1 gene) was present in the self-sterile strain. The two direct repeats were believed to be responsible for the deletion through homologous intra-molecular recombination in meiosis. Tetrad analyses showed that all small ascospore-derived strains lacked the missing DNA between the two direct repeats that was found in all large ascospore-derived strains. In addition, heterokaryons at the MAT locus were observed in field isolates as well as in laboratory derived isolates.

July 7, 2019

ABO allele-level frequency estimation based on population-scale genotyping by next generation sequencing.

The characterization of the ABO blood group status is vital for blood transfusion and solid organ transplantation. Several methods for the molecular characterization of the ABO gene, which encodes the alleles that give rise to the different ABO blood groups, have been described. However, the application of those methods has so far been restricted to selected samples and not been applied to population-scale analysis.We describe a cost-effective method for high-throughput genotyping of the ABO system by next generation sequencing. Sample specific barcodes and sequencing adaptors are introduced during PCR, rendering the products suitable for direct sequencing on Illumina MiSeq or HiSeq instruments. Complete sequence coverage of exons 6 and 7 enables molecular discrimination of the ABO subgroups and many alleles. The workflow was applied to ABO genotype more than a million samples. We report the allele group frequencies calculated on a subset of more than 110,000 sampled individuals of German origin. Further we discuss the potential of the workflow for high resolution genotyping taking the observed allele group frequencies into account. Finally, sequence analysis revealed 287 distinct so far not described alleles of which the most abundant one was identified in 174 samples.The described workflow delivers high resolution ABO genotyping at low cost enabling population-scale molecular ABO characterization.

July 7, 2019

SimLoRD: Simulation of Long Read Data.

Third generation sequencing methods provide longer reads than second generation methods and have distinct error characteristics. While there exist many read simulators for second generation data, there is a very limited choice for third generation data.We analyzed public data from Pacific Biosciences (PacBio) SMRT sequencing, developed an error model and implemented it in a new read simulator called SimLoRD. It offers options to choose the read length distribution and to model error probabilities depending on the number of passes through the sequencer. The new error model makes SimLoRD the most realistic SMRT read simulator available.SimLoRD is available open source at http://bitbucket.org/genomeinformatics/simlord/ and installable via Bioconda (http://bioconda.github.io).Bianca.Stoecker@uni-due.de or Sven.Rahmann@uni-due.deSupplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10?kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads.We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9?min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools.https://github.com/lh3/minimap and https://github.com/lh3/miniasmhengli@broadinstitute.orgSupplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

A hot L1 retrotransposon evades somatic repression and initiates human colorectal cancer.

Although human LINE-1 (L1) elements are actively mobilized in many cancers, a role for somatic L1 retrotransposition in tumor initiation has not been conclusively demonstrated. Here, we identify a novel somatic L1 insertion in the APC tumor suppressor gene that provided us with a unique opportunity to determine whether such insertions can actually initiate colorectal cancer (CRC), and if so, how this might occur. Our data support a model whereby a hot L1 source element on Chromosome 17 of the patient’s genome evaded somatic repression in normal colon tissues and thereby initiated CRC by mutating the APC gene. This insertion worked together with a point mutation in the second APC allele to initiate tumorigenesis through the classic two-hit CRC pathway. We also show that L1 source profiles vary considerably depending on the ancestry of an individual, and that population-specific hot L1 elements represent a novel form of cancer risk. © 2016 Scott et al.; Published by Cold Spring Harbor Laboratory Press.

Auto Tag: Bioinformatics

Assembly of long error-prone reads using de Bruijn graphs.

Selecting reads for haplotype assembly

Genome sequence of the multiantibiotic-resistant Enterococcus faecium strain C68 and insights on the pLRM23 colonization plasmid.

Complete genome sequence of Mycoplasma mycoides subsp. mycoides T1/44, a vaccine strain against contagious bovine pleuropneumonia.

Complete genome sequence of Enterococcus faecium ATCC 700221.

High-quality draft genomes from Thermus caliditerrae YIM 77777 and T. tengchongensis YIM 77401, isolates from Tengchong, China.

Genome sequence of Madurella mycetomatis mm55, isolated from a human mycetoma case in Sudan.

Microevolution of monophasic Salmonella Typhimurium during epidemic, United Kingdom, 2005-2010.

Complete genome sequence of the Mycobacterium immunogenum type strain CCUG 47286.

The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts.

Direct repeat-mediated DNA deletion of the mating type MAT1-2 genes results in unidirectional mating type switching in Sclerotinia trifoliorum.

ABO allele-level frequency estimation based on population-scale genotyping by next generation sequencing.

SimLoRD: Simulation of Long Read Data.

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

A hot L1 retrotransposon evades somatic repression and initiates human colorectal cancer.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert