AGBT 2013 Presentation Slides: Cold Spring Harbor Laboratory’s Michael Schatz presented strategies for de novo assembly of crop genomes with PacBio technolgy.
The assembly of metagenomes is dramatically improved by the long read lengths of SMRT Sequencing. This is demonstrated in an experimental design to sequence a mock community from the Human Microbiome Project, and assemble the data using the hierarchical genome assembly process (HGAP) at Pacific Biosciences. Results of this analysis are promising, and display much improved contiguity in the assembly of the mock community as compared to publicly available short-read data sets and assemblies. Additionally, the use of base modification information to make further associations between contigs provides additional data to improve assemblies, and to distinguish between members within a microbial community. The epigenetic approach is a novel validation method unique to SMRT Sequencing. In addition to whole-genome shotgun sequencing, SMRT Sequencing also offers improved classification resolution and reliability of metagenomic and microbiome samples by the full-length sequencing of 16S rRNA (~1500 bases long). Microbial communities can be detected at the species level in some cases, rather than being limited to the genus taxonomic classification as constrained by short-read technologies. The performance of SMRT Sequencing for these metagenomic samples achieved >99% predicted concordance to reference sequences in cecum, soil, water, and mock control investigations for bacterial 16S. Community samples are estimated to contain from 2.3 and up to 15 times as many species with abundance levels as low as 0.05% compared to the identification of phyla groups.
For comprehensive metabolic reconstructions and a resulting understanding of the pathways leading to natural products, it is desirable to obtain complete information about the genetic blueprint of the organisms used. Traditional Sanger and next-generation, short-read sequencing technologies have shortcomings with respect to read lengths and DNA-sequence context bias, leading to fragmented and incomplete genome information. The development of long-read, single molecule, real-time (SMRT) DNA sequencing from Pacific Biosciences, with >10,000 bp average read lengths and a lack of sequence context bias, now allows for the generation of complete genomes in a fully automated workflow. In addition to the genome sequence, DNA methylation is characterized in the process of sequencing. PacBio® sequencing has also been applied to microbial transcriptomes. Long reads enable sequencing of full-length cDNAs allowing for identification of complete gene and operon sequences without the need for transcript assembly. We will highlight several examples where these capabilities have been leveraged in the areas of industrial microbiology, including biocommodities, biofuels, bioremediation, new bacteria with potential commercial applications, antibiotic discovery, and livestock/plant microbiome interactions.
High-throughput sequencing of the complete 16S rRNA gene has become a valuable tool for characterizing microbial communities. However, the short reads produced by second-generation sequencing cannot provide taxonomic classification below the genus level. In this study, we demonstrate the capability of PacBio’s Single Molecule, Real-Time (SMRT) Sequencing to generate community profiles using mock microbial community samples from BEI Resources. We also evaluate multiplexing capabilities using PacBio barcodes on pooled samples comprising heterogeneous 16S amplicon populations representing soil, fecal, and mock communities.
Comparison of sequencing approaches applied to complex soil metagenomes to resolve proteins of interest
Background: Long-read sequencing presents several potential advantages for providing more complete gene profiling of metagenomic samples. Long reads can capture multiple genes in a single read, and longer reads typically result in assemblies with better contiguity, especially for higher abundance organisms. However, a major challenge with using long reads has been the higher cost per base, which may lead to insufficient coverage of low-abundance species. Additionally, lower single-pass accuracy can make gene discovery for low-abundance organisms difficult. Methods: To evaluate the pros and cons of long reads for metagenomics, we directly compared PacBio and Illumina sequencing on a soil-derived sample, which included spike-in controls of known concentrations of pure referenced samples. For PacBio sequencing, a 10 kb library was sequenced on the Sequel System with 3.0 chemistry. Highly accurate long reads (HiFi reads) with Q20 and higher were generated for downstream analyses using PacBio Circular Consensus Sequencing (CCS) mode. Results were assessed according to the following criteria: DNA extraction capacity, bioinformatics pipeline status, % of proteins with ambiguous AA’s, total unique error-free genes/$1000, total proteins observed in spike-ins/$1000, proteins of interest/$1000, median length of contigs with proteins, and assembly requirements. Results: Both methods had areas of superior performance. DNA extraction capacity was higher for Illumina, the bioinformatics pipeline is well-tested, and there was a lower proportion of proteins with ambiguous AA’s. On the other hand, with PacBio, twice as many unique error-free genes, twice as many total proteins from spike-ins, and ~6 times more proteins of interest were found per $1000 cost. PacBio data produced on average 5 times longer contigs capturing proteins of interest. Additionally, assembly was not required for gene or protein finding, as was the case with Illumina data. Conclusions: In this comparison of PacBio Sequel System with Illumina NextSeq on a complex microbiome, we conclude that the sequencing system of choice may vary, depending on the goals and resources for the project. PacBio sequencing requires a longer DNA extraction method, and the bioinformatics pipeline may require development. On the other hand, the Sequel System generates hundreds of thousands of long HiFi reads per SMRT Cell, producing more genes, more proteins, and longer contigs, thereby offering more information about the metagenomic samples for a lower cost.
User Group Meeting: New genotype to phenotype associations in viral metagenomes enabled by SMRT Sequencing
In this PacBio User Group Meeting lightning talk, Shawn Polson of the University of Delaware speaks about viral metagenomes, which are more challenging to distinguish than their bacterial counterparts because…
Understanding interactions among plants and the complex communities of organisms living on, in and around them requires more than one experimental approach. A new method for de novo metagenome assembly,…
Mark Blaxter, project lead of the Sanger Institute’s Darwin Tree of Life, shared an update of the ambitious effort to sequence all 60,000 species believed to be on the British…
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated bromelain inhibitors. Four candidate genes for self-incompatibility were linked in F153, but were not functional in self-compatible CB5. Our findings support the coexistence of sexual recombination and a one-step operation in the domestication of clonally propagated crops. This work guides the exploration of sexual and asexual domestication trajectories in other clonally propagated crops.
Analyses of the Complete Genome Sequence of the Strain Bacillus pumilus ZB201701 Isolated from Rhizosphere Soil of Maize under Drought and Salt Stress.
Bacillus pumilus ZB201701 is a rhizobacterium with the potential to promote plant growth and tolerance to drought and salinity stress. We herein present the complete genome sequence of the Gram-positive bacterium B. pumilus ZB201701, which consists of a linear chromosome with 3,640,542 base pairs, 3,608 protein-coding sequences, 24 ribosomal RNAs, and 80 transfer RNAs. Genome analyses using bioinformatics revealed some of the putative gene clusters involved in defense mechanisms. In addition, activity analyses of the strain under salt and simulated drought stress suggested its potential tolerance to abiotic stress. Plant growth-promoting bacteria-based experiments indicated that the strain promotes the salt tolerance of maize. The complete genome of B. pumilus ZB201701 provides valuable insights into rhizobacteria-mediated salt and drought tolerance and rhizobacteria-based solutions for abiotic stress in agriculture.
Characterization of Extracellular Biosurfactants Expressed by a Pseudomonas putida Strain Isolated from the Interior of Healthy Roots from Sida hermaphrodita Grown in a Heavy Metal Contaminated Soil.
Pseudomonas putida E41 isolated from root interior of Sida hermaphrodita (grown on a field contaminated with heavy metals) showed high biosurfactant activity. In this paper, we describe data from mass spectrometry and genome analysis, to improve our understanding on the phenotypic properties of the strain. Supernatant derived from P. putida E41 liquid culture exhibited a strong decrease in the surface tension accompanied by the ability for emulsion stabilization. We identified extracellular lipopeptides, putisolvin I and II expression but did not detect rhamnolipids. Their presence was confirmed by matrix-assisted laser desorption and ionization (MALDI) TOF/TOF technique. Moreover, ten phospholipids (mainly phosphatidylethanolamines PE 33:1 and PE 32:1) which were excreted by vesicles were also detected. In contrast the bacterial cell pellet was dominated by phosphatidylglycerols (PGs), which were almost absent in the supernatant. It seems that the composition of extracellular (secreted to the environment) and cellular lipids in this strain differs. Long-read sequencing and complete genome reconstruction allowed the identification of a complete putisolvin biosynthesis pathway. In the genome of P. putida E41 were also found all genes involved in glycerophospholipid biosynthesis, and they are likely responsible for the production of detected phospholipids. Overall this is the first report describing the expression of extracellular lipopeptides (identified as putisolvins) and phospholipids by a P. putida strain, which might be explained by the need to adapt to the highly contaminated environment.
Cupriavidus sp. strain Ni-2 resistant to high concentration of nickel and its genes responsible for the tolerance by genome comparison.
The widespread use of metals influenced many researchers to examine the relationship between heavy metal toxicity and bacterial resistance. In this study, we have inoculated heavy metal-contaminated soil from Janghang region of South Korea in the nickel-containing media (20 mM Ni2+) for the enrichment. Among dozens of the colonies acquired from the several transfers and serial dilutions with the same concentrations of Ni, the strain Ni-2 was chosen for further studies. The isolates were identified for their phylogenetic affiliations using 16S rRNA gene analysis. The strain Ni-2 was close to Cupriavidus metallidurans and was found to be resistant to antibiotics of vancomycin, erythromycin, chloramphenicol, ampicillin, gentamicin, streptomycin, and kanamycin by disk diffusion method. Of the isolated strains, Ni-2 was sequenced for the whole genome, since the Ni-resistance seemed to be better than the other strains. From the genome sequence we have found that there was a total of 89 metal-resistance-related genes including 11 Ni-resistance genes, 41 heavy metal (As, Cd, Zn, Hg, Cu, and Co)-resistance genes, 22 cation-efflux genes, 4 metal pumping ATPase genes, and 11 metal transporter genes.
A novel Gram-stain-positive, motile, white color and endospore-forming bacterium, designated 18JY67-1T, was isolated from soil in Jeju Island, Korea. The strain grow at 15-42 °C (optimum 30 °C) in R2A medium at pH (6.0-9.5) (optimum 7.5). Phylogenetic analysis based on 16S rRNA gene sequences indicated that strain 18JY67-1T formed a distinct lineage within the family Paenibacillaceae (order Bacillales, class Bacilli), and was closely related to Paenibacillus rhizoryzae (KP675984; 96.9% 16S rRNA gene sequence similarity). The major cellular fatty acids of the strain 18JY67-1T were C16:0 and anteiso-C15:0. The predominant respiratory quinones were MK-7. The major polar lipid was identified as diphosphatidylglycerol. On the basis of phenotypic, chemotaxonomic and genotypic properties clearly indicated that isolate 18JY67-1T represents a novel species within the genus Paenibacillus, for which the name Paenibacillus flavus sp. nov. is proposed. The type strain of Paenibacillus flavus is 18JY67-1T (=?KCTC 33959T =?JCM 33184T).
Tigecycline is one of the last-resort antibiotics to treat complicated infections caused by both multidrug-resistant Gram-negative and Gram-positive bacteria1. Tigecycline resistance has sporadically occurred in recent years, primarily due to chromosome-encoding mechanisms, such as overexpression of efflux pumps and ribosome protection2,3. Here, we report the emergence of the plasmid-mediated mobile tigecycline resistance mechanism Tet(X4) in Escherichia coli isolates from China, which is capable of degrading all tetracyclines, including tigecycline and the US FDA newly approved eravacycline. The tet(X4)-harbouring IncQ1 plasmid is highly transferable, and can be successfully mobilized and stabilized in recipient clinical and laboratory strains of Enterobacteriaceae bacteria. It is noteworthy that tet(X4)-positive E.?coli strains, including isolates co-harbouring mcr-1, have been widely detected in pigs, chickens, soil and dust samples in China. In vivo murine models demonstrated that the presence of Tet(X4) led to tigecycline treatment failure. Consequently, the emergence of plasmid-mediated Tet(X4) challenges the clinical efficacy of the entire family of tetracycline antibiotics. Importantly, our study raises concern that the plasmid-mediated tigecycline resistance may further spread into various ecological niches and into clinical high-risk pathogens. Collective efforts are in urgent need to preserve the potency of these essential antibiotics.
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.