Menu
July 19, 2019

Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies.

During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20?kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.


July 19, 2019

Single-molecule sequencing reveals the molecular basis of multidrug-resistance in ST772 methicillin-resistant Staphylococcus aureus.

Methicillin-resistant Staphylococcus aureus (MRSA) is a major cause of hospital-associated infection, but there is growing awareness of the emergence of multidrug-resistant lineages in community settings around the world. One such lineage is ST772-MRSA-V, which has disseminated globally and is increasingly prevalent in India. Here, we present the complete genome sequence of DAR4145, a strain of the ST772-MRSA-V lineage from India, and investigate its genomic characteristics in regards to antibiotic resistance and virulence factors.Sequencing using single-molecule real-time technology resulted in the assembly of a single continuous chromosomal sequence, which was error-corrected, annotated and compared to nine draft genome assemblies of ST772-MRSA-V from Australia, Malaysia and India. We discovered numerous and redundant resistance genes associated with mobile genetic elements (MGEs) and known core genome mutations that explain the highly antibiotic resistant phenotype of DAR4145. Staphylococcal toxins and superantigens, including the leukotoxin Panton-Valentinin Leukocidin, were predominantly associated with genomic islands and the phage f-IND772PVL. Some of these mobile resistance and virulence factors were variably present in other strains of the ST772-MRSA-V lineage.The genomic characteristics presented here emphasize the contribution of MGEs to the emergence of multidrug-resistant and highly virulent strains of community-associated MRSA. Antibiotic resistance was further augmented by chromosomal mutations and redundancy of resistance genes. The complete genome of DAR4145 provides a valuable resource for future investigations into the global dissemination and phylogeography of ST772-MRSA-V.


July 19, 2019

Mind the gap; seven reasons to close fragmented genome assemblies.

Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including those that study non-model organisms. Thus, hundreds of fungal genomes have been sequenced and are publically available today, although these initiatives have typically yielded considerably fragmented genome assemblies that often lack large contiguous genomic regions. Many important genomic features are contained in intergenic DNA that is often missing in current genome assemblies, and recent studies underscore the significance of non-coding regions and repetitive elements for the life style, adaptability and evolution of many organisms. The study of particular types of genetic elements, such as telomeres, centromeres, repetitive elements, effectors, and clusters of co-regulated genes, but also of phenomena such as structural rearrangements, genome compartmentalization and epigenetics, greatly benefits from having a contiguous and high-quality, preferably even complete and gapless, genome assembly. Here we discuss a number of important reasons to produce gapless, finished, genome assemblies to help answer important biological questions. Copyright © 2015 Elsevier Inc. All rights reserved.


July 19, 2019

An improved genome reference for the African cichlid, Metriaclima zebra.

Problems associated with using draft genome assemblies are well documented and have become more pronounced with the use of short read data for de novo genome assembly. We set out to improve the draft genome assembly of the African cichlid fish, Metriaclima zebra, using a set of Pacific Biosciences SMRT sequencing reads corresponding to 16.5× coverage of the genome. Here we characterize the improvements that these long reads allowed us to make to the state-of-the-art draft genome previously assembled from short read data.Our new assembly closed 68 % of the existing gaps and added 90.6Mbp of new non-gap sequence to the existing draft assembly of M. zebra. Comparison of the new assembly to the sequence of several bacterial artificial chromosome clones confirmed the accuracy of the new assembly. The closure of sequence gaps revealed thousands of new exons, allowing significant improvement in gene models. We corrected one known misassembly, and identified and fixed other likely misassemblies. 63.5 Mbp (70 %) of the new sequence was classified as repetitive and the new sequence allowed for the assembly of many more transposable elements.Our improvements to the M. zebra draft genome suggest that a reasonable investment in long reads could greatly improve many comparable vertebrate draft genome assemblies.


July 19, 2019

Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16?kilobases) reads with random errors, we assembled 99% (244?megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4?megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.


July 19, 2019

A supergene determines highly divergent male reproductive morphs in the ruff.

Three strikingly different alternative male mating morphs (aggressive ‘independents’, semicooperative ‘satellites’ and female-mimic ‘faeders’) coexist as a balanced polymorphism in the ruff, Philomachus pugnax, a lek-breeding wading bird. Major differences in body size, ornamentation, and aggressive and mating behaviors are inherited as an autosomal polymorphism. We show that development into satellites and faeders is determined by a supergene consisting of divergent alternative, dominant and non-recombining haplotypes of an inversion on chromosome 11, which contains 125 predicted genes. Independents are homozygous for the ancestral sequence. One breakpoint of the inversion disrupts the essential CENP-N gene (encoding centromere protein N), and pedigree analysis confirms the lethality of homozygosity for the inversion. We describe new differences in behavior, testis size and steroid metabolism among morphs and identify polymorphic genes within the inversion that are likely to contribute to the differences among morphs in reproductive traits.


July 19, 2019

The power of Single Molecule Real-Time sequencing technology in the de novo assembly of a eukaryotic genome.

Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.


July 19, 2019

AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences.

Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present ‘AnnoTALE’, a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities.


July 19, 2019

The epigenomic landscape of prokaryotes.

DNA methylation acts in concert with restriction enzymes to protect the integrity of prokaryotic genomes. Studies in a limited number of organisms suggest that methylation also contributes to prokaryotic genome regulation, but the prevalence and properties of such non-restriction-associated methylation systems remain poorly understood. Here, we used single molecule, real-time sequencing to map DNA modifications including m6A, m4C, and m5C across the genomes of 230 diverse bacterial and archaeal species. We observed DNA methylation in nearly all (93%) organisms examined, and identified a total of 834 distinct reproducibly methylated motifs. This data enabled annotation of the DNA binding specificities of 620 DNA Methyltransferases (MTases), doubling known specificities for previously hard to study Type I, IIG and III MTases, and revealing their extraordinary diversity. Strikingly, 48% of organisms harbor active Type II MTases with no apparent cognate restriction enzyme. These active ‘orphan’ MTases are present in diverse bacterial and archaeal phyla and show motif specificities and methylation patterns consistent with functions in gene regulation and DNA replication. Our results reveal the pervasive presence of DNA methylation throughout the prokaryotic kingdoms, as well as the diversity of sequence specificities and potential functions of DNA methylation systems.


July 19, 2019

Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution.

The Heliconius butterflies are a widely studied adaptive radiation of 46 species spread across Central and South America, several of which are known to hybridize in the wild. Here, we present a substantially improved assembly of the Heliconius melpomene genome, developed using novel methods that should be applicable to improving other genome assemblies produced using short read sequencing. First, we whole-genome-sequenced a pedigree to produce a linkage map incorporating 99% of the genome. Second, we incorporated haplotype scaffolds extensively to produce a more complete haploid version of the draft genome. Third, we incorporated ~20x coverage of Pacific Biosciences sequencing, and scaffolded the haploid genome using an assembly of this long-read sequence. These improvements result in a genome of 795 scaffolds, 275 Mb in length, with an N50 length of 2.1 Mb, an N50 number of 34, and with 99% of the genome placed, and 84% anchored on chromosomes. We use the new genome assembly to confirm that the Heliconius genome underwent 10 chromosome fusions since the split with its sister genus Eueides, over a period of about 6 million yr. Copyright © 2016 Davey et al.


July 19, 2019

Large genomic differences between Moraxella bovoculi isolates acquired from the eyes of cattle with infectious bovine keratoconjunctivitis versus the deep nasopharynx of asymptomatic cattle.

Moraxella bovoculi is a recently described bacterium that is associated with infectious bovine keratoconjunctivitis (IBK) or “pinkeye” in cattle. In this study, closed circularized genomes were generated for seven M. bovoculi isolates: three that originated from the eyes of clinical IBK bovine cases and four from the deep nasopharynx of asymptomatic cattle. Isolates that originated from the eyes of IBK cases profoundly differed from those that originated from the nasopharynx of asymptomatic cattle in genome structure, gene content and polymorphism diversity and consequently placed into two distinct phylogenetic groups. These results suggest that there are genetically distinct strains of M. bovoculi that may not associate with IBK.


July 19, 2019

The complete genome sequence of the murine pathobiont Helicobacter typhlonius.

Immuno-compromised mice infected with Helicobacter typhlonius are used to model microbially inducted inflammatory bowel disease (IBD). The specific mechanism through which H. typhlonius induces and promotes IBD is not fully understood. Access to the genome sequence is essential to examine emergent properties of this organism, such as its pathogenicity. To this end, we present the complete genome sequence of H. typhlonius MIT 97-6810, obtained through single-molecule real-time sequencing.The genome was assembled into a single circularized contig measuring 1.92 Mbp with an average GC content of 38.8%. In total 2,117 protein-encoding genes and 43 RNA genes were identified. Numerous pathogenic features were found, including a putative pathogenicity island (PAIs) containing components of type IV secretion system, virulence-associated proteins and cag PAI protein. We compared the genome of H. typhlonius to those of the murine pathobiont H. hepaticus and human pathobiont H. pylori. H. typhlonius resembles H. hepaticus most with 1,594 (75.3%) of its genes being orthologous to genes in H. hepaticus. Determination of the global methylation state revealed eight distinct recognition motifs for adenine and cytosine methylation. H. typhlonius shares four of its recognition motifs with H. pylori.The complete genome sequence of H. typhlonius MIT 97-6810 enabled us to identify many pathogenic features suggesting that H. typhlonius can act as a pathogen. Follow-up studies are necessary to evaluate the true nature of its pathogenic capabilities. We found many methylated sites and a plethora of restriction-modification systems. The genome, together with the methylome, will provide an essential resource for future studies investigating gene regulation, host interaction and pathogenicity of H. typhlonius. In turn, this work can contribute to unraveling the role of Helicobacter in enteric disease.


July 19, 2019

PacBio SMRT assembly of a complex multi-replicon genome reveals chlorocatechol degradative operon in a region of genome plasticity.

We have sequenced a Burkholderia genome that contains multiple replicons and large repetitive elements that would make it inherently difficult to assemble by short read sequencing technologies. We illustrate how the integrated long read correction algorithms implemented through the PacBio Single Molecule Real-Time (SMRT) sequencing technology successfully provided a de novo assembly that is a reasonable estimate of both the gene content and genome organization without making any further modifications. This assembly is comparable to related organisms assembled by more labour intensive methods. Our assembled genome revealed regions of genome plasticity for further investigation, one of which harbours a chlorocatechol degradative operon highly homologous to those previously identified on globally ubiquitous plasmids. In an ideal world, this assembly would still require experimental validation to confirm gene order and copy number of repeated elements. However, we submit that particularly in instances where a polished genome is not the primary goal of the sequencing project, PacBio SMRT sequencing provides a financially viable option for generating a biologically relevant genome estimate that can be utilized by other researchers for comparative studies. Copyright © 2016. Published by Elsevier B.V.


July 19, 2019

Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding.

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.


July 19, 2019

Next generation sequencing of Actinobacteria for the discovery of novel natural products.

Like many fields of the biosciences, actinomycete natural products research has been revolutionised by next-generation DNA sequencing (NGS). Hundreds of new genome sequences from actinobacteria are made public every year, many of them as a result of projects aimed at identifying new natural products and their biosynthetic pathways through genome mining. Advances in these technologies in the last five years have meant not only a reduction in the cost of whole genome sequencing, but also a substantial increase in the quality of the data, having moved from obtaining a draft genome sequence comprised of several hundred short contigs, sometimes of doubtful reliability, to the possibility of obtaining an almost complete and accurate chromosome sequence in a single contig, allowing a detailed study of gene clusters and the design of strategies for refactoring and full gene cluster synthesis. The impact that these technologies are having in the discovery and study of natural products from actinobacteria, including those from the marine environment, is only starting to be realised. In this review we provide a historical perspective of the field, analyse the strengths and limitations of the most relevant technologies, and share the insights acquired during our genome mining projects.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.