SMRT-Cappable-seq combines the isolation of full-length prokaryotic primary transcripts with long read sequencing technology. It is the first experimental methodology to sequence entire prokaryotic transcripts. It identifies the transcription start site and termination site, thereby directly defines the operon structures genome-wide in prokaryotes. Applied to E.coli, SMRT-Cappable-seq identifies a total of ~2300 operons, among which ~900 are novel. Importantly, our result reveals a pervasive read-through of previous experimentally validated transcription termination sites. Termination read-through represents a powerful strategy to control gene expression. Taken together this data provides a first glance at the complexity of the ‘operome’ in bacteria and presents an invaluable resource for understanding gene regulation and function in bacteria.
Next-generation sequencing has become a useful tool for studying transcriptomes. However, these methods typically rely on sequencing short fragments of cDNA, then attempting to assemble the pieces into full-length transcripts. Here, we describe a method that uses PacBio long reads to sequence full-length cDNAs from individual transcriptomes and metatranscriptome samples. We have adapted the PacBio Iso-Seq protocol for use with prokaryotic samples by incorporating RNA polyadenylation and rRNA-depletion steps. In conjunction with SMRT Sequencing, which has average readlengths of 10-15 kb, we are able to sequence entire transcripts, including polycistronic RNAs, in a single read. Here, we show full-length bacterial transcriptomes with the ability to visualize transcription of operons. In the area of metatranscriptomics, long reads reveal unambiguous gene sequences without the need for post-sequencing transcript assembly. We also show full-length bacterial transcripts sequenced after being treated with NEB’s Cappable-Seq, which is an alternative method for depleting rRNA and enriching for full-length transcripts with intact 5’ ends. Combining Cappable-Seq with PacBio long reads allows for the detection of transcription start sites, with the additional benefit of sequencing entire transcripts.
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Evolution of a 72-kb cointegrant, conjugative multiresistance plasmid from early community-associated methicillin-resistant Staphylococcus aureus isolates.
Horizontal transfer of plasmids encoding antimicrobial-resistance and virulence determinants has been instrumental in Staphylococcus aureus evolution, including the emergence of community-associated methicillin-resistant S. aureus (CA-MRSA). In the early 1990s the first CA-MRSA isolated in Western Australia (WA), WA-5, encoded cadmium, tetracycline and penicillin-resistance genes on plasmid pWBG753 (~30 kb). WA-5 and pWBG753 appeared only briefly in WA, however, fusidic-acid-resistance plasmids related to pWBG753 were also present in the first European CA-MRSA at the time. Here we characterized a 72-kb conjugative plasmid pWBG731 present in multiresistant WA-5-like clones from the same period. pWBG731 was a cointegrant formed from pWBG753 and a pWBG749-family conjugative plasmid. pWBG731 carried mupirocin, trimethoprim, cadmium and penicillin-resistance genes. The stepwise evolution of pWBG731 likely occurred through the combined actions of IS257, IS257-dependent miniature inverted-repeat transposable elements (MITEs) and the BinL resolution system of the ß-lactamase transposon Tn552 An evolutionary intermediate ~42-kb non-conjugative plasmid pWBG715, possessed the same resistance genes as pWBG731 but retained an integrated copy of the small tetracycline-resistance plasmid pT181. IS257 likely facilitated replacement of pT181 with conjugation genes on pWBG731, thus enabling autonomous transfer. Like conjugative plasmid pWBG749, pWBG731 also mobilized non-conjugative plasmids carrying oriT mimics. It seems likely that pWBG731 represents the product of multiple recombination events between the WA-5 pWBG753 plasmid and other mobile genetic elements present in indigenous CA-MSSA. The molecular evolution of pWBG731 saliently illustrates how diverse mobile genetic elements can together facilitate rapid accrual and horizontal dissemination of multiresistance in S. aureus CA-MRSA.Copyright © 2019 American Society for Microbiology.
Evolution and global transmission of a multidrug-resistant, community-associated MRSA lineage from the Indian subcontinent
The evolution and global transmission of antimicrobial resistance has been well documented in Gram-negative bacteria and healthcare-associated epidemic pathogens, often emerging from regions with heavy antimicrobial use. However, the degree to which similar processes occur with Gram-positive bacteria in the community setting is less well understood. Here, we trace the recent origins and global spread of a multidrug resistant, community-associated Staphylococcus aureus lineage from the Indian subcontinent, the Bengal Bay clone (ST772). We generated whole genome sequence data of 340 isolates from 14 countries, including the first isolates from Bangladesh and India, to reconstruct the evolutionary history and genomic epidemiology of the lineage. Our data shows that the clone emerged on the Indian subcontinent in the early 1970s and disseminated rapidly in the 1990s. Short-term outbreaks in community and healthcare settings occurred following intercontinental transmission, typically associated with travel and family contacts on the subcontinent, but ongoing endemic transmission was uncommon. Acquisition of a multidrug resistance integrated plasmid was instrumental in the divergence of a single dominant and globally disseminated clade in the early 1990s. Phenotypic data on biofilm, growth and toxicity point to antimicrobial resistance as the driving force in the evolution of ST772. The Bengal Bay clone therefore combines the multidrug resistance of traditional healthcare-associated clones with the epidemiological transmission of community-associated MRSA. Our study demonstrates the importance of whole genome sequencing for tracking the evolution of emerging and resistant pathogens. It provides a critical framework for ongoing surveillance of the clone on the Indian subcontinent and elsewhere.Importance The Bengal Bay clone (ST772) is a community-acquired and multidrug-resistant Staphylococcus aureus lineage first isolated from Bangladesh and India in 2004. In this study, we show that the Bengal Bay clone emerged from a virulent progenitor circulating on the Indian subcontinent. Its subsequent global transmission was associated with travel or family contact in the region. ST772 progressively acquired specific resistance elements at limited cost to its fitness and continues to be exported globally resulting in small-scale community and healthcare outbreaks. The Bengal Bay clone therefore combines the virulence potential and epidemiology of community-associated clones with the multidrug-resistance of healthcare-associated S. aureus lineages. This study demonstrates the importance of whole genome sequencing for the surveillance of highly antibiotic resistant pathogens, which may emerge in the community setting of regions with poor antibiotic stewardship and rapidly spread into hospitals and communities across the world.
Pecan (Carya illinoinensis) and Chinese hickory (C. cathayensis) are important commercially cultivated nut trees in the genus Carya (Juglandaceae), with high nutritional value and substantial health benefits.We obtained >187.22 and 178.87 gigabases of sequence, and ~288× and 248× genome coverage, to a pecan cultivar (“Pawnee”) and a domesticated Chinese hickory landrace (ZAFU-1), respectively. The total assembly size is 651.31 megabases (Mb) for pecan and 706.43 Mb for Chinese hickory. Two genome duplication events before the divergence from walnut were found in these species. Gene family analysis highlighted key genes in biotic and abiotic tolerance, oil, polyphenols, essential amino acids, and B vitamins. Further analyses of reduced-coverage genome sequences of 16 Carya and 2 Juglans species provide additional phylogenetic perspective on crop wild relatives.Cooperative characterization of these valuable resources provides a window to their evolutionary development and a valuable foundation for future crop improvement. © The Author(s) 2019. Published by Oxford University Press.
Detection of VIM-1-Producing Enterobacter cloacae and Salmonella enterica Serovars Infantis and Goldcoast at a Breeding Pig Farm in Germany in 2017 and Their Molecular Relationship to Former VIM-1-Producing S. Infantis Isolates in German Livestock Production.
In 2011, VIM-1-producing Salmonella enterica serovar Infantis and Escherichia coli were isolated for the first time in four German livestock farms. In 2015/2016, highly related isolates were identified in German pig production. This raised the issue of potential reservoirs for these isolates, the relation of their mobile genetic elements, and potential links between the different affected farms/facilities. In a piglet-producing farm suspicious for being linked to some blaVIM-1 findings in Germany, fecal and environmental samples were examined for the presence of carbapenemase-producing Enterobacteriaceae and Salmonella spp. Newly discovered isolates were subjected to Illumina whole-genome sequencing (WGS) and S1 pulsed-field gel electrophoresis (PFGE) hybridization experiments. WGS data of these isolates were compared with those for the previously isolated VIM-1-producing Salmonella Infantis isolates from pigs and poultry. Among 103 samples, one Salmonella Goldcoast isolate, one Salmonella Infantis isolate, and one Enterobacter cloacae isolate carrying the blaVIM-1 gene were detected. Comparative WGS analysis revealed that the blaVIM-1 gene was part of a particular Tn21-like transposable element in all isolates. It was located on IncHI2 (ST1) plasmids of ~290 to 300?kb with a backbone highly similar (98 to 100%) to that of reference pSE15-SA01028. SNP analysis revealed a close relationship of all VIM-1-positive S Infantis isolates described since 2011. The findings of this study demonstrate that the occurrence of the blaVIM-1 gene in German livestock is restricted neither to a certain bacterial species nor to a certain Salmonella serovar but is linked to a particular Tn21-like transposable element located on transferable pSE15-SA01028-like IncHI2 (ST1) plasmids, being present in all of the investigated isolates from 2011 to 2017.IMPORTANCE Carbapenems are considered one of few remaining treatment options against multidrug-resistant Gram-negative pathogens in human clinical settings. The occurrence of carbapenemase-producing Enterobacteriaceae in livestock and food is a major public health concern. Particularly the occurrence of VIM-1-producing Salmonella Infantis in livestock farms is worrisome, as this zoonotic pathogen is one of the main causes for human salmonellosis in Europe. Investigations on the epidemiology of those carbapenemase-producing isolates and associated mobile genetic elements through an in-depth molecular characterization are indispensable to understand the transmission of carbapenemase-producing Enterobacteriaceae along the food chain and between different populations to develop strategies to prevent their further spread.Copyright © 2019 Roschanski et al.
Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes.We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome.LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Genome Sequence of Rhizobium jaguaris CCGE525T, a Strain Isolated from Calliandra grandiflora Nodules from a Rain Forest in Mexico.
We present the genome sequence of Rhizobium jaguaris CCGE525T, a nitrogen-fixing bacterium isolated from nodules of Calliandra grandiflora. CCGE525T belongs to Rhizobium tropici group A, represents the symbiovar calliandrae, and forms nitrogen-fixing nodules in Phaseolus vulgaris. Genome-based metrics and phylogenomic approaches support Rhizobium jaguaris as a novel species.
Members of the genus Nocardia are widespread in diverse environments; a wide range of Nocardia species are known to cause nocardiosis in several animals, including cat, dog, fish, and humans. Of the pathogenic Nocardia species, N. seriolae is known to cause disease in cultured fish, resulting in major economic loss. We isolated two N. seriolae strains, CK-14008 and EM15050, from diseased fish and sequenced their genomes using the PacBio sequencing platform. To identify their genomic features, we compared their genomes with those of other Nocardia species. Phylogenetic analysis showed that N. seriolae shares a common ancestor with a putative human pathogenic Nocardia species. Moreover, N. seriolae strains were phylogenetically divided into four clusters according to host fish families. Through genome comparison, we observed that the putative pathogenic Nocardia strains had additional genes for iron acquisition. Dozens of antibiotic resistance genes were detected in the genomes of N. seriolae strains; most of the antibiotics were involved in the inhibition of the biosynthesis of proteins or cell walls. Our results demonstrated the virulence features and antibiotic resistance of fish pathogenic N. seriolae strains at the genomic level. These results may be useful to develop strategies for the prevention of fish nocardiosis. © 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Genome mining identifies cepacin as a plant-protective metabolite of the biopesticidal bacterium Burkholderia ambifaria.
Beneficial microorganisms are widely used in agriculture for control of plant pathogens, but a lack of efficacy and safety information has limited the exploitation of multiple promising biopesticides. We applied phylogeny-led genome mining, metabolite analyses and biological control assays to define the efficacy of Burkholderia ambifaria, a naturally beneficial bacterium with proven biocontrol properties but potential pathogenic risk. A panel of 64 B.?ambifaria strains demonstrated significant antimicrobial activity against priority plant pathogens. Genome sequencing, specialized metabolite biosynthetic gene cluster mining and metabolite analysis revealed an armoury of known and unknown pathways within B.?ambifaria. The biosynthetic gene cluster responsible for the production of the metabolite cepacin was identified and directly shown to mediate protection of germinating crops against Pythium damping-off disease. B.?ambifaria maintained biopesticidal protection and overall fitness in the soil after deletion of its third replicon, a non-essential plasmid associated with virulence in Burkholderia?cepacia complex bacteria. Removal of the third replicon reduced B.?ambifaria persistence in a murine respiratory infection model. Here, we show that by using interdisciplinary phylogenomic, metabolomic and functional approaches, the mode of action of natural biological control agents related to pathogens can be systematically established to facilitate their future exploitation.
Arthropod Mycoplasma are little known endosymbionts in insects, primarily known as plant disease vectors. Mycoplasma in other arthropods such as arachnids are unknown. We report the first complete Mycoplasma genome sequenced, identified, and annotated from a scorpion, Centruroides vittatus, and designate it as Mycoplasma vittatus We find the genome is at least a 683,827 bp single circular chromosome with a GC content of 42.7% and with 987 protein-coding genes. The putative virulence determinants include 11 genes associated with the virulence operon associated with protein synthesis or DNA transcription and ten genes with antibiotic and toxic compound resistance. Comparative analysis revealed that the M. vittatus genome is smaller than other Mycoplasma genomes and exhibits a higher GC content. Phylogenetic analysis shows M. vittatus as part of the Hominis group of Mycoplasma As arthropod genomes accumulate, further novel Mycoplasma genomes may be identified and characterized. Copyright © 2019 Yamashita et al.
Polysaccharide utilization loci of North Sea Flavobacteriia as basis for using SusC/D-protein expression for predicting major phytoplankton glycans.
Marine algae convert a substantial fraction of fixed carbon dioxide into various polysaccharides. Flavobacteriia that are specialized on algal polysaccharide degradation feature genomic clusters termed polysaccharide utilization loci (PULs). As knowledge on extant PUL diversity is sparse, we sequenced the genomes of 53 North Sea Flavobacteriia and obtained 400 PULs. Bioinformatic PUL annotations suggest usage of a large array of polysaccharides, including laminarin, a-glucans, and alginate as well as mannose-, fucose-, and xylose-rich substrates. Many of the PULs exhibit new genetic architectures and suggest substrates rarely described for marine environments. The isolates’ PUL repertoires often differed considerably within genera, corroborating ecological niche-associated glycan partitioning. Polysaccharide uptake in Flavobacteriia is mediated by SusCD-like transporter complexes. Respective protein trees revealed clustering according to polysaccharide specificities predicted by PUL annotations. Using the trees, we analyzed expression of SusC/D homologs in multiyear phytoplankton bloom-associated metaproteomes and found indications for profound changes in microbial utilization of laminarin, a-glucans, ß-mannan, and sulfated xylan. We hence suggest the suitability of SusC/D-like transporter protein expression within heterotrophic bacteria as a proxy for the temporal utilization of discrete polysaccharides.
The Genome of Armadillidium vulgare (Crustacea, Isopoda) Provides Insights into Sex Chromosome Evolution in the Context of Cytoplasmic Sex Determination.
The terrestrial isopod Armadillidium vulgare is an original model to study the evolution of sex determination and symbiosis in animals. Its sex can be determined by ZW sex chromosomes, or by feminizing Wolbachia bacterial endosymbionts. Here, we report the sequence and analysis of the ZW female genome of A. vulgare. A distinguishing feature of the 1.72 gigabase assembly is the abundance of repeats (68% of the genome). We show that the Z and W sex chromosomes are essentially undifferentiated at the molecular level and the W-specific region is extremely small (at most several hundreds of kilobases). Our results suggest that recombination suppression has not spread very far from the sex-determining locus, if at all. This is consistent with A. vulgare possessing evolutionarily young sex chromosomes. We characterized multiple Wolbachia nuclear inserts in the A. vulgare genome, none of which is associated with the W-specific region. We also identified several candidate genes that may be involved in the sex determination or sexual differentiation pathways. The A. vulgare genome serves as a resource for studying the biology and evolution of crustaceans, one of the most speciose and emblematic metazoan groups. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Newly designed 16S rRNA metabarcoding primers amplify diverse and novel archaeal taxa from the environment.
High-throughput studies of microbial communities suggest that Archaea are a widespread component of microbial diversity in various ecosystems. However, proper quantification of archaeal diversity and community ecology remains limited, as sequence coverage of Archaea is usually low owing to the inability of available prokaryotic primers to efficiently amplify archaeal compared to bacterial rRNA genes. To improve identification and quantification of Archaea, we designed and validated the utility of several primer pairs to efficiently amplify archaeal 16S rRNA genes based on up-to-date reference genes. We demonstrate that several of these primer pairs amplify phylogenetically diverse Archaea with high sequencing coverage, outperforming commonly used primers. Based on comparing the resulting long 16S rRNA gene fragments with public databases from all habitats, we found several novel family- to phylum-level archaeal taxa from topsoil and surface water. Our results suggest that archaeal diversity has been largely overlooked due to the limitations of available primers, and that improved primer pairs enable to estimate archaeal diversity more accurately. © 2018 The Authors. Environmental Microbiology Reports published by Society for Applied Microbiology and John Wiley & Sons Ltd.