MEGAN Archives

April 21, 2020 |

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ~36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

April 21, 2020 |

A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome.

More than 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests,” which provide unique niches and 3-dimensional living space for other organisms. The octocoral genus Renilla inhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans. Renilla is especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several anthozoan genomes are currently available, the majority of these are hexacorals. Here, we present a de novo assembly of an azooxanthellate shallow-water octocoral, Renilla muelleri.We generated a hybrid de novo assembly using MaSuRCA v.3.2.6. The final assembly included 4,825 scaffolds and a haploid genome size of 172 megabases (Mb). A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustus ab initio gene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone, Nematostella vectensis, or to the Uniprot database. Although the R. muelleri genome may be smaller (172 Mb minimum size) than other publicly available coral genomes (256-448 Mb), the R. muelleri genome is similar to other coral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.The R. muelleri hybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity. © The Author(s) 2019. Published by Oxford University Press.

April 21, 2020 |

MSC: a metagenomic sequence classification algorithm.

Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.Microbiome researchers are generally interested in two objectives of a taxonomic classifier: (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances.The implementations are freely available for non-commercial purposes. They can be downloaded from https://drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl. © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

April 21, 2020 |

Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps.

Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16?GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and EM-based estimation of sample composition, MetaMaps achieves >94% accuracy for species-level read assignment and r2?>?0.97 for the estimation of sample composition on both simulated and real data when the sample genomes or close relatives are present in the classification database. To address novel species and genera, which are comparatively harder to predict, MetaMaps outputs mapping locations and qualities for all classified reads, enabling functional studies (e.g. gene presence/absence) and detection of incongruities between sample and reference genomes.

April 21, 2020 |

Complete Assembly of the Genome of an Acidovorax citrulli Strain Reveals a Naturally Occurring Plasmid in This Species.

Acidovorax citrulli is the causal agent of bacterial fruit blotch (BFB), a serious threat to cucurbit crop production worldwide. Based on genetic and phenotypic properties, A. citrulli strains are divided into two major groups: group I strains have been generally isolated from melon and other non-watermelon cucurbits, while group II strains are closely associated with watermelon. In a previous study, we reported the genome of the group I model strain, M6. At that time, the M6 genome was sequenced by MiSeq Illumina technology, with reads assembled into 139 contigs. Here, we report the assembly of the M6 genome following sequencing with PacBio technology. This approach not only allowed full assembly of the M6 genome, but it also revealed the occurrence of a ~53 kb plasmid. The M6 plasmid, named pACM6, was further confirmed by plasmid extraction, Southern-blot analysis of restricted fragments and obtention of M6-derivative cured strains. pACM6 occurs at low copy numbers (average of ~4.1 ± 1.3 chromosome equivalents) in A. citrulli M6 and contains 63 open reading frames (ORFs), most of which (55.6%) encoding hypothetical proteins. The plasmid contains several genes encoding type IV secretion components, and typical plasmid-borne genes involved in plasmid maintenance, replication and transfer. The plasmid also carries an operon encoding homologs of a Fic-VbhA toxin-antitoxin (TA) module. Transcriptome data from A. citrulli M6 revealed that, under the tested conditions, the genes encoding the components of this TA system are among the highest expressed genes in pACM6. Whether this TA module plays a role in pACM6 maintenance is still to be determined. Leaf infiltration and seed transmission assays revealed that, under tested conditions, the loss of pACM6 did not affect the virulence of A. citrulli M6. We also show that pACM6 or similar plasmids are present in several group I strains, but absent in all tested group II strains of A. citrulli.

September 22, 2019 |

Detecting epigenetic motifs in low coverage and metagenomics settings.

It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer’s neighborhood.Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with “neighbor” modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.https://github.com/alibashir/EMMCKmer.

September 22, 2019 |

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data.

The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.

September 22, 2019 |

MetaSort untangles metagenome assembly by reducing microbial community complexity.

Most current approaches to analyse metagenomic data rely on reference genomes. Novel microbial communities extend far beyond the coverage of reference databases and de novo metagenome assembly from complex microbial communities remains a great challenge. Here we present a novel experimental and bioinformatic framework, metaSort, for effective construction of bacterial genomes from metagenomic samples. MetaSort provides a sorted mini-metagenome approach based on flow cytometry and single-cell sequencing methodologies, and employs new computational algorithms to efficiently recover high-quality genomes from the sorted mini-metagenome by the complementary of the original metagenome. Through extensive evaluations, we demonstrated that metaSort has an excellent and unbiased performance on genome recovery and assembly. Furthermore, we applied metaSort to an unexplored microflora colonized on the surface of marine kelp and successfully recovered 75 high-quality genomes at one time. This approach will greatly improve access to microbial genomes from complex or novel communities.

September 22, 2019 |

Analysis of gut microbiota – An ever changing landscape.

In the last two decades, the field of metagenomics has greatly expanded due to improvement in sequencing technologies allowing for a more comprehensive characterization of microbial communities. The use of these technologies has led to an unprecedented understanding of human, animal, and environmental microbiomes and have shown that the gut microbiota are comparable to an organ that is intrinsically linked with a variety of diseases. Characterization of microbial communities using next-generation sequencing-by-synthesis approaches have revealed important shifts in microbiota associated with debilitating diseases such as Clostridium difficile infection. But due to limitations in sequence read length, primer biases, and the quality of databases, genus- and species-level classification have been difficult. Third-generation technologies, such as Pacific Biosciences’ single molecule, real-time (SMRT) approach, allow for unbiased, more specific identification of species that are likely clinically relevant. Comparison of Illumina next-generation characterization and SMRT sequencing of samples from patients treated for C. difficile infection revealed similarities in community composition at the phylum and family levels, but SMRT sequencing further allowed for species-level characterization – permitting a better understanding of the microbial ecology of this disease. Thus, as sequencing technologies continue to advance, new species-level insights can be gained in the study of complex and clinically-relevant microbial communities.

September 22, 2019 |

Long-term microbiota and virome in a Zürich patient after fecal transplantation against Clostridium difficile infection.

Fecal microbiota transplantation (FMT) is an emerging therapeutic option for Clostridium difficile infections that are refractory to conventional treatment. FMT introduces fecal microbes into the patient’s intestine that prevent the recurrence of C. difficile, leading to rapid expansion of bacteria characteristic of healthy microbiota. However, the long-term effects of FMT remain largely unknown. The C. difficile patient described in this paper revealed protracted microbiota adaptation processes from 6 to 42 months post-FMT. Ultimately, bacterial communities were donor similar, suggesting sustainable stool engraftment. Since little is known about the consequences of transmitted viruses during C. difficile infection, we also interrogated virome changes. Our approach allowed identification of about 10 phage types per sample that represented larger viral communities, and phages were found to be equally abundant in the cured patient and donor. The healthy microbiota appears to be characterized by low phage abundance. Although viruses were likely transferred, the patient established a virome distinct from the donor. Surprisingly, the patient had sequences of algal giant viruses (chloroviruses) that have not previously been reported for the human gut. Chloroviruses have not been associated with intestinal disease, but their presence in the oropharynx may influence cognitive abilities. The findings suggest that the virome is an important indicator of health or disease. A better understanding of the role of viruses in the gut ecosystem may uncover novel microbiota-modulating therapeutic strategies.© 2016 New York Academy of Sciences.

September 22, 2019 |

Biogas production from hydrothermal liquefaction wastewater (HTLWW): Focusing on the microbial communities as revealed by high-throughput sequencing of full-length 16S rRNA genes.

Hydrothermal liquefaction (HTL) is an emerging and promising technology for the conversion of wet biomass into bio-crude, however, little attention has been paid to the utilization of hydrothermal liquefaction wastewater (HTLWW) with high concentration of organics. The present study investigated biogas production from wastewater obtained from HTL of straw for bio-crude production, with focuses on the analysis of the microbial communities and characterization of the organics. Batch experiments showed the methane yield of HTLWW (R-HTLWW) was 184 mL/g COD, while HTLWW after petroleum ether extraction (PE-HTLWW), to extract additional bio-crude, had higher methane yield (235 mL/g COD) due to the extraction of recalcitrant organic compounds. Sequential batch experiments further demonstrated the higher methane yield of PE-HTLWW. LC-TOF-MS, HPLC and gel filtration chromatography showed organics with molecular weight (MW) < 1000 were well degraded. Results from the high-throughput sequencing of full-length 16S rRNA genes analysis showed similar microbial community compositions were obtained for the reactors fed with either R-HTLWW or PE-HTLWW. The degradation of fatty acids were related with Mesotoga infera, Syntrophomonas wolfei et al. by species level identification. However, the species related to the degradation of other compounds (e.g. phenols) were not found, which could be due to the presence of uncharacterized microorganisms. It was also found previously proposed criteria (97% and 98.65% similarity) for species identification of 16S rRNA genes were not suitable for a fraction of 16S rRNA genes. Copyright © 2016 Elsevier Ltd. All rights reserved.

September 22, 2019 |

Effects of antibiotic on microflora in ileum and cecum for broilers by 16S rRNA sequence analysis.

An experiment was conducted to analyze and compare the microbial composition, abundance, dynamic distribution, and functions without and with antibiotic fed to broilers. A 16S rRNA-sequencing approach was used to evaluate the bacterial composition of the gut of male broilers under different groups. A total of 240 1-day old AA male broilers were randomly assigned to two groups, with 120 broilers per group. The treatment group was administered an antibiotic with their feed, while the control group was not administered antibiotic (control group). A total of 10 replicates were assessed per treatment. The control group was fed a basal diet containing corn, soybean meal, and cottonseed meal and met the nutritional requirement. The antibiotic group was fed 100 mg/kg aureomycin (based on the basal diet). The trial lasted 42 days. Operational taxonomic unit partition and classification, alpha diversity, taxonomic composition, beta diversity, and microflora comparative analyses along with key species screening were performed for all of the treatment groups. Our data indicate that aureomycin treatment in broilers is directly correlated with variations of the gut content of specific bacterial taxa, and herein provide insights into the impact of antibiotic on microbial communities in cecum and ileum of broiler chickens.© 2018 Japanese Society of Animal Science.

September 22, 2019 |

Long-term changes of bacterial and viral compositions in the intestine of a recovered Clostridium difficile patient after fecal microbiota transplantation

Fecal microbiota transplantation (FMT) is an effective treatment for recurrent Clostridium difficile infections (RCDIs). However, long-term effects on the patients’ gut microbiota and the role of viruses remain to be elucidated. Here, we characterized bacterial and viral microbiota in the feces of a cured RCDI patient at various time points until 4.5 yr post-FMT compared with the stool donor. Feces were subjected to DNA sequencing to characterize bacteria and double-stranded DNA (dsDNA) viruses including phages. The patient’s microbial communities varied over time and showed little overall similarity to the donor until 7 mo post-FMT, indicating ongoing gut microbiota adaption in this time period. After 4.5 yr, the patient’s bacteria attained donor-like compositions at phylum, class, and order levels with similar bacterial diversity. Differences in the bacterial communities between donor and patient after 4.5 yr were seen at lower taxonomic levels. C. difficile remained undetectable throughout the entire timespan. This demonstrated sustainable donor feces engraftment and verified long-term therapeutic success of FMT on the molecular level. Full engraftment apparently required longer than previously acknowledged, suggesting the implementation of year-long patient follow-up periods into clinical practice. The identified dsDNA viruses were mainly Caudovirales phages. Unexpectedly, sequences related to giant algae–infecting Chlorella viruses were also detected. Our findings indicate that intestinal viruses may be implicated in the establishment of gut microbiota. Therefore, virome analyses should be included in gut microbiota studies to determine the roles of phages and other viruses—such as Chlorella viruses—in human health and disease, particularly during RCDI.

September 22, 2019 |

MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs.

There are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads. Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbiome sequencing, and there is a need to adapt short read tools to long read datasets.We describe a new LCA-based algorithm for taxonomic binning, and an interval-tree based algorithm for functional binning, that are explicitly designed for long reads and assembled contigs. We provide a new interactive tool for investigating the alignment of long reads against reference sequences. For taxonomic and functional binning, we propose to use LAST to compare long reads against the NCBI-nr protein reference database so as to obtain frame-shift aware alignments, and then to process the results using our new methods.All presented methods are implemented in the open source edition of MEGAN, and we refer to this new extension as MEGAN-LR (MEGAN long read). We evaluate the LAST+MEGAN-LR approach in a simulation study, and on a number of mock community datasets consisting of Nanopore reads, PacBio reads and assembled PacBio reads. We also illustrate the practical application on a Nanopore dataset that we sequenced from an anammox bio-rector community.This article was reviewed by Nicola Segata together with Moreno Zolfo, Pete James Lockhart and Serghei Mangul.This work extends the applicability of the widely-used metagenomic analysis software MEGAN to long reads. Our study suggests that the presented LAST+MEGAN-LR pipeline is sufficiently fast and accurate.

September 22, 2019 |

The microbiome of the leaf surface of Arabidopsis protects against a fungal pathogen.

We have explored the importance of the phyllosphere microbiome in plant resistance in the cuticle mutants bdg (BODYGUARD) or lacs2.3 (LONG CHAIN FATTY ACID SYNTHASE 2) that are strongly resistant to the fungal pathogen Botrytis cinerea. The study includes infection of plants under sterile conditions, 16S ribosomal DNA sequencing of the phyllosphere microbiome, and isolation and high coverage sequencing of bacteria from the phyllosphere. When inoculated under sterile conditions bdg became as susceptible as wild-type (WT) plants whereas lacs2.3 mutants retained the resistance. Adding washes of its phyllosphere microbiome could restore the resistance of bdg mutants, whereas the resistance of lacs2.3 results from endogenous mechanisms. The phyllosphere microbiome showed distinct populations in WT plants compared to cuticle mutants. One species identified as Pseudomonas sp isolated from the microbiome of bdg provided resistance to B. cinerea on Arabidopsis thaliana as well as on apple fruits. No direct activity was observed against B. cinerea and the action of the bacterium required the plant. Thus, microbes present on the plant surface contribute to the resistance to B. cinerea. These results open new perspectives on the function of the leaf microbiome in the protection of plants.© 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

Auto Tag: MEGAN

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome.

MSC: a metagenomic sequence classification algorithm.

Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps.

Complete Assembly of the Genome of an Acidovorax citrulli Strain Reveals a Naturally Occurring Plasmid in This Species.

Detecting epigenetic motifs in low coverage and metagenomics settings.

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data.

MetaSort untangles metagenome assembly by reducing microbial community complexity.

Analysis of gut microbiota – An ever changing landscape.

Long-term microbiota and virome in a Zürich patient after fecal transplantation against Clostridium difficile infection.

Biogas production from hydrothermal liquefaction wastewater (HTLWW): Focusing on the microbial communities as revealed by high-throughput sequencing of full-length 16S rRNA genes.

Effects of antibiotic on microflora in ileum and cecum for broilers by 16S rRNA sequence analysis.

Long-term changes of bacterial and viral compositions in the intestine of a recovered Clostridium difficile patient after fecal microbiota transplantation

MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs.

The microbiome of the leaf surface of Arabidopsis protects against a fungal pathogen.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert