Since the advent of Next-Generation Sequencing (NGS), the cost of de novo genome sequencing and assembly have dropped precipitately, which has spurred interest in genome sequencing overall. Unfortunately the contiguity of the NGS assembled sequences, as well as the accuracy of these assemblies have suffered. Additionally, most NGS de novo assemblies leave large portions of genomes unresolved, and repetitive regions are often collapsed. When compared to the reference quality genome sequences produced before the NGS era, the new sequences are highly fragmented and often prove to be difficult to properly annotate. In some cases the contiguous portions are smaller than the average gene size making the sequence not nearly as useful for biologists as the earlier reference quality genomes including of Human, Mouse, C. elegans, or Drosophila. Recently, new 3rd generation sequencing technologies, long-range molecular techniques, and new informatics tools have facilitated a return to high quality assembly. We will discuss the capabilities of the technologies and assess their impact on assembly projects across the tree of life from small microbial and fungal genomes through large plant and animal genomes. Beyond improvements to contiguity, we will focus on the additional biological insights that can be made with better assemblies, including more complete analysis genes in their flanking regulatory context, in-depth studies of transposable elements and other complex gene families, and long-range synteny analysis of entire chromosomes. We will also discuss the need for new algorithms for representing and analyzing collections of many complete genomes at once.
The free-living flatworm, Macrostomum lignano, much like its better known planarian relative, Schmidtea mediterranea, has an impressive regenerative capacity. Following injury, this species has the ability to regenerate almost an entirely new organism. This is attributable to the presence of an abundant somatic stem cell population, the neoblasts. These cells are also essential for the ongoing maintenance of most tissues, as their loss leads to irreversible degeneration of the animal. This set of unique properties makes a subset of flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of Macrostomum lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ~75% of its sequence being comprised of simple repeats and transposon sequences. This has made high quality assembly from Illumina reads alone impossible (N50=222 bp). We therefore generated 130X coverage by long sequencing reads from the PacBio platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene expression patterns during regeneration, examining pathways important to stem cell function. As a whole, our data will provide a crucial resource for the community for the study not only of invertebrate evolution and phylogeny but also of regeneration and somatic pluripotency.
PacBio Sequencing is characterized by very long sequence reads (averaging > 10,000 bases), lack of GC-bias, and high consensus accuracy. These features have allowed the method to provide a new…
In this PacBio User Group Meeting presentation, Nic Wheeler of University of Wisconsin-Madison, speaks about RNA sequencing for filarial nematodes associated with understudied tropical diseases. His team used Iso-Seq analysis…
In this LabRoots webinar, Jonas Korlach the CSO of PacBio provides an introduction to PacBio HiFi sequence reads, which are both long (up to 25 kb currently) and accurate (>99%)…
Webinar: No Organism Too Small – Build high-quality genome assemblies of small organisms with HiFi sequencing
In this webinar you will hear how several researchers have overcome the challenges of sequencing organisms with small body size using the new low and ultra-low DNA input methods from…
Existing long-read assemblers require tens of thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a novel long-read assembler wtdbg2 that, for human data, is tens of times faster than published tools while achieving comparable contiguity and accuracy. It represents a significant algorithmic advance and paves the way for population-scale long-read assembly in future.
Using bacteria to transform reactive corrosion products into stable compounds represents an alternative to traditional methods employed in iron conservation. Two environmental Aeromonas strains (CA23 and CU5) were used to transform ferric iron corrosion products (goethite and lepidocrocite) into stable ferrous iron-bearing minerals (vivianite and siderite). A genomic and transcriptomic approach was used to analyze the metabolic traits of these strains and to evaluate their pathogenic potential. Although genes involved in solid-phase iron reduction were identified, key genes present in other environmental iron-reducing species are missing from the genome of CU5. Several pathogenicity factors were identified in the genomes of both strains, but none of these was expressed under iron reduction conditions. Additional in vivo tests showed hemolytic and cytotoxic activities for strain CA23 but not for strain CU5. Both strains were easily inactivated using ethanol and heat. Nonetheless, given a lesser potential for a pathogenic lifestyle, CU5 is the most promising candidate for the development of a bio-based iron conservation method stabilizing iron corrosion. Based on all the results, a prototype treatment was established using archaeological items. On those, the conversion of reactive corrosion products and the formation of a homogenous layer of biogenic iron minerals were achieved. This study shows how naturally occurring microorganisms and their metabolic capabilities can be used to develop bio-inspired solutions to the problem of metal corrosion.IMPORTANCE Microbiology can greatly help in the quest for a sustainable solution to the problem of iron corrosion, which causes important economic losses in a wide range of fields, including the protection of cultural heritage and building materials. Using bacteria to transform reactive and unstable corrosion products into more-stable compounds represents a promising approach. The overall aim of this study was to develop a method for the conservation and restoration of corroded iron items, starting from the isolation of iron-reducing bacteria from natural environments. This resulted in the identification of a suitable candidate (Aeromonas sp. strain CU5) that mediates the formation of desirable minerals at the surfaces of the objects. This led to the proof of concept of an application method on real objects.Copyright © 2019 Kooli et al.
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry any function. However, recently, the modern quantum development of high scale multi-omics techniques has shifted B research towards a new-born field that we call “B-omics”. We review the recent literature and add novel perspectives to the B research, discussing the role of new technologies to understand the mechanistic perspectives of the molecular evolution and function of Bs. The modern view states that B chromosomes are enriched with genes for many significant biological functions, including but not limited to the interesting set of genes related to cell cycle and chromosome structure. Furthermore, the presence of B chromosomes could favor genomic rearrangements and influence the nuclear environment affecting the function of other chromatin regions. We hypothesize that B chromosomes might play a key function in driving their transmission and maintenance inside the cell, as well as offer an extra genomic compartment for evolution.
Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes.We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome.LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Intercellular communication is required for trap formation in the nematode-trapping fungus Duddingtonia flagrans.
Nematode-trapping fungi (NTF) are a large and diverse group of fungi, which may switch from a saprotrophic to a predatory lifestyle if nematodes are present. Different fungi have developed different trapping devices, ranging from adhesive cells to constricting rings. After trapping, fungal hyphae penetrate the worm, secrete lytic enzymes and form a hyphal network inside the body. We sequenced the genome of Duddingtonia flagrans, a biotechnologically important NTF used to control nematode populations in fields. The 36.64 Mb genome encodes 9,927 putative proteins, among which are more than 638 predicted secreted proteins. Most secreted proteins are lytic enzymes, but more than 200 were classified as small secreted proteins (< 300 amino acids). 117 putative effector proteins were predicted, suggesting interkingdom communication during the colonization. As a first step to analyze the function of such proteins or other phenomena at the molecular level, we developed a transformation system, established the fluorescent proteins GFP and mCherry, adapted an assay to monitor protein secretion, and established gene-deletion protocols using homologous recombination or CRISPR/Cas9. One putative virulence effector protein, PefB, was transcriptionally induced during the interaction. We show that the mature protein is able to be imported into nuclei in Caenorhabditis elegans cells. In addition, we studied trap formation and show that cell-to-cell communication is required for ring closure. The availability of the genome sequence and the establishment of many molecular tools will open new avenues to studying this biotechnologically relevant nematode-trapping fungus.
Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.
The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.Copyright © 2019 Elsevier Ltd. All rights reserved.
Extracellular RNA has been proposed to mediate communication between cells and organisms however relatively little is understood regarding how specific sequences are selected for export. Here, we describe a specific Argonaute protein (exWAGO) that is secreted in extracellular vesicles (EVs) released by the gastrointestinal nematode Heligmosomoides bakeri, at multiple copies per EV. Phylogenetic and gene expression analyses demonstrate exWAGO orthologues are highly conserved and abundantly expressed in related parasites but highly diverged in free-living genus Caenorhabditis. We show that the most abundant small RNAs released from the nematode parasite are not microRNAs as previously thought, but rather secondary small interfering RNAs (siRNAs) that are produced by RNA-dependent RNA Polymerases. The siRNAs that are released in EVs have distinct evolutionary properties compared to those resident in free-living or parasitic nematodes. Immunoprecipitation of exWAGO demonstrates that it specifically associates with siRNAs from transposons and newly evolved repetitive elements that are packaged in EVs and released into the host environment. Together this work demonstrates molecular and evolutionary selectivity in the small RNA sequences that are released in EVs into the host environment and identifies a novel Argonaute protein as the mediator of this. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
A new study “recompletes” the C. elegans genome sequence, revealing hitherto unseen genes.