De novo assembly Archives - Page 180 of 324

July 7, 2019

The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production.

July 7, 2019

Epigenetics: Reading methylated genomes

Single-molecule sequencing allows simultaneous reading of DNA bases and their methylation state in bacterial genomes.

July 7, 2019

Hammondia hammondi, an avirulent relative of Toxoplasma gondii, has functional orthologs of known T. gondii virulence genes.

Toxoplasma gondii is a ubiquitous protozoan parasite capable of infecting all warm-blooded animals, including humans. Its closest extant relative, Hammondia hammondi, has never been found to infect humans and, in contrast to T. gondii, is highly attenuated in mice. To better understand the genetic bases for these phenotypic differences, we sequenced the genome of a H. hammondi isolate (HhCatGer041) and found the genomic synteny between H. hammondi and T. gondii to be >95%. We used this genome to determine the H. hammondi primary sequence of two major T. gondii mouse virulence genes, TgROP5 and TgROP18. When we expressed these genes in T. gondii, we found that H. hammondi orthologs of TgROP5 and TgROP18 were functional. Similar to T. gondii, the HhROP5 locus is expanded, and two distinct HhROP5 paralogs increased the virulence of a T. gondii TgROP5 knockout strain. We also identified a 107 base pair promoter region, absent only in type III TgROP18, which is necessary for TgROP18 expression. This result indicates that the ROP18 promoter was active in the most recent common ancestor of these two species and that it was subsequently inactivated in progenitors of the type III lineage. Overall, these data suggest that the virulence differences between these species are not solely due to the functionality of these key virulence factors. This study provides evidence that other mechanisms, such as differences in gene expression or the lack of currently uncharacterized virulence factors, may underlie the phenotypic differences between these species.

July 7, 2019

The coming revolution: microbes and multiscale biology

New technologies are providing a more complete view of the functional dynamics of microorganisms such as E. coli and V. cholerae.

July 7, 2019

Cerulean: A hybrid assembly using high throughput short and long reads

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.

July 7, 2019

Complete genome sequence of Staphylococcus aureus Tager 104, a sequence type 49 ancestor.

We report here the complete genome sequence of Staphylococcus aureus Tager 104, originally isolated from a cutaneous abscess in 1947 by Morris Tager. Sequence typing of the strain revealed its membership in sequence type 49 (ST49), a previously unknown multilocus sequence type (MLST) in clinical samples.

July 7, 2019

Finished bacterial genomes from shotgun sequence data.

Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been “finished” at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.

July 7, 2019

Structure of the type IV secretion system in different strains of Anaplasma phagocytophilum.

Anaplasma phagocytophilum is an intracellular organism in the Order Rickettsiales that infects diverse animal species and is causing an emerging disease in humans, dogs and horses. Different strains have very different cell tropisms and virulence. For example, in the U.S., strains have been described that infect ruminants but not dogs or rodents. An intriguing question is how the strains of A. phagocytophilum differ and what different genome loci are involved in cell tropisms and/or virulence. Type IV secretion systems (T4SS) are responsible for translocation of substrates across the cell membrane by mechanisms that require contact with the recipient cell. They are especially important in organisms such as the Rickettsiales which require T4SS to aid colonization and survival within both mammalian and tick vector cells. We determined the structure of the T4SS in 7 strains from the U.S. and Europe and revised the sequence of the repetitive virB6 locus of the human HZ strain.Although in all strains the T4SS conforms to the previously described split loci for vir genes, there is great diversity within these loci among strains. This is particularly evident in the virB2 and virB6 which are postulated to encode the secretion channel and proteins exposed on the bacterial surface. VirB6-4 has an unusual highly repetitive structure and can have a molecular weight greater than 500,000. For many of the virs, phylogenetic trees position A. phagocytophilum strains infecting ruminants in the U.S. and Europe distant from strains infecting humans and dogs in the U.S.Our study reveals evidence of gene duplication and considerable diversity of T4SS components in strains infecting different animals. The diversity in virB2 is in both the total number of copies, which varied from 8 to 15 in the herein characterized strains, and in the sequence of each copy. The diversity in virB6 is in the sequence of each of the 4 copies in the single locus and the presence of varying numbers of repetitive units in virB6-3 and virB6-4. These data suggest that the T4SS should be investigated further for a potential role in strain virulence of A. phagocytophilum.

July 7, 2019

Genome sequence of “Candidatus Microthrix parvicella” Bio17-1, a long-chain-fatty-acid-accumulating filamentous actinobacterium from a biological wastewater treatment plant.

Candidatus Microthrix bacteria are deeply branching filamentous actinobacteria which occur at the water-air interface of biological wastewater treatment plants, where they are often responsible for foaming and bulking. Here, we report the first draft genome sequence of a strain from this genus: “Candidatus Microthrix parvicella” strain Bio17-1.

July 7, 2019

A hybrid approach for the automated finishing of bacterial genomes.

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.

July 7, 2019

Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011.

The degree to which molecular epidemiology reveals information about the sources and transmission patterns of an outbreak depends on the resolution of the technology used and the samples studied. Isolates of Escherichia coli O104:H4 from the outbreak centered in Germany in May-July 2011, and the much smaller outbreak in southwest France in June 2011, were indistinguishable by standard tests. We report a molecular epidemiological analysis using multiplatform whole-genome sequencing and analysis of multiple isolates from the German and French outbreaks. Isolates from the German outbreak showed remarkably little diversity, with only two single nucleotide polymorphisms (SNPs) found in isolates from four individuals. Surprisingly, we found much greater diversity (19 SNPs) in isolates from seven individuals infected in the French outbreak. The German isolates form a clade within the more diverse French outbreak strains. Moreover, five isolates derived from a single infected individual from the French outbreak had extremely limited diversity. The striking difference in diversity between the German and French outbreak samples is consistent with several hypotheses, including a bottleneck that purged diversity in the German isolates, variation in mutation rates in the two E. coli outbreak populations, or uneven distribution of diversity in the seed populations that led to each outbreak.

July 7, 2019

Complete genome sequence of Liberibacter crescens BT-1.

Liberibacter crescens BT-1, a Gram-negative, rod-shaped bacterial isolate, was previously recovered from mountain papaya to gain insight on Huanglongbing (HLB) and Zebra Chip (ZC) diseases. The genome of BT-1 was sequenced at the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. A finished assembly and annotation yielded one chromosome with a length of 1,504,659 bp and a G+C content of 35.4%. Comparison to other species in the Liberibacter genus, L. crescens has many more genes in thiamine and essential amino acid biosynthesis. This likely explains why L. crescens BT-1 is culturable while the known Liberibacter strains have not yet been cultured. Similar to Candidatus L. asiaticus psy62, the L. crescens BT-1 genome contains two prophage regions.

July 7, 2019

Analysis of the genome of Mycobacterium abscessus strain M94 reveals an uncommon cluster of tRNAs.

Mycobacterium abscessus is a species of rapidly growing nontuberculous mycobacteria that is frequently associated with opportunistic infections in humans. Here, we report the annotated genome sequence of M. abscessus strain M94, which showed an unusual cluster of tRNAs.

July 7, 2019

Improving genome assemblies by sequencing PCR products with PacBio.

Advances in sequencing technologies have dramatically reduced costs in producing high-quality draft genomes. However, there are still many contigs and possible misassembled regions in those draft genomes. Improving the quality of these genomes requires an efficient and economical means to close gaps and resequence some regions. Sequencing pooled gap region PCR products with Pacific Biosciences (PacBio) provides a significantly less expensive means for this need. We have developed a genome improvement pipeline with this strategy after decreasing a loading bias against larger PCR products in the PacBio process. Compared with Sanger technology, this approach is not only cost-effective but also can close gaps greater than 2.5 kb in a single round of reactions, and sequence through high GC regions as well as difficult secondary structures such as small hairpin loops.

July 7, 2019

Draft genome sequence of Salimicrobium sp. strain MJ3, isolated from Myulchi-Jeot, Korean fermented seafood.

Salimicrobium sp. strain MJ3 was isolated from myulchi-jeot, traditional fermented seafood made from anchovy in South Korea. Here we announce the draft genome sequence of Salimicrobium sp. MJ3 with 2,717,782 bp, which consists of 45 contigs (>500 bp in size), and provide a description of their annotation.

Auto Tag: De novo assembly

The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

Epigenetics: Reading methylated genomes

Hammondia hammondi, an avirulent relative of Toxoplasma gondii, has functional orthologs of known T. gondii virulence genes.

The coming revolution: microbes and multiscale biology

Cerulean: A hybrid assembly using high throughput short and long reads

Complete genome sequence of Staphylococcus aureus Tager 104, a sequence type 49 ancestor.

Finished bacterial genomes from shotgun sequence data.

Structure of the type IV secretion system in different strains of Anaplasma phagocytophilum.

Genome sequence of “Candidatus Microthrix parvicella” Bio17-1, a long-chain-fatty-acid-accumulating filamentous actinobacterium from a biological wastewater treatment plant.

A hybrid approach for the automated finishing of bacterial genomes.

Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011.

Complete genome sequence of Liberibacter crescens BT-1.

Analysis of the genome of Mycobacterium abscessus strain M94 reveals an uncommon cluster of tRNAs.

Improving genome assemblies by sequencing PCR products with PacBio.

Draft genome sequence of Salimicrobium sp. strain MJ3, isolated from Myulchi-Jeot, Korean fermented seafood.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert