Menu
June 1, 2021  |  

Developments in PacBio metagenome sequencing: Shotgun whole genomes and full-length 16S.

The assembly of metagenomes is dramatically improved by the long read lengths of SMRT Sequencing. This is demonstrated in an experimental design to sequence a mock community from the Human Microbiome Project, and assemble the data using the hierarchical genome assembly process (HGAP) at Pacific Biosciences. Results of this analysis are promising, and display much improved contiguity in the assembly of the mock community as compared to publicly available short-read data sets and assemblies. Additionally, the use of base modification information to make further associations between contigs provides additional data to improve assemblies, and to distinguish between members within a microbial community. The epigenetic approach is a novel validation method unique to SMRT Sequencing. In addition to whole-genome shotgun sequencing, SMRT Sequencing also offers improved classification resolution and reliability of metagenomic and microbiome samples by the full-length sequencing of 16S rRNA (~1500 bases long). Microbial communities can be detected at the species level in some cases, rather than being limited to the genus taxonomic classification as constrained by short-read technologies. The performance of SMRT Sequencing for these metagenomic samples achieved >99% predicted concordance to reference sequences in cecum, soil, water, and mock control investigations for bacterial 16S. Community samples are estimated to contain from 2.3 and up to 15 times as many species with abundance levels as low as 0.05% compared to the identification of phyla groups.


June 1, 2021  |  

Genome analysis of a bacterium that causes lameness.

Lameness is a significant problem resulting in millions of dollars in lost revenue annually. In commercial broilers, the most common cause of lameness is bacterial chondronecrosis with osteomyelitis (BCO). We are using a wire flooring model to induce lameness attributable to BCO. We used 16S ribosomal DNA sequencing to determine that Staphylococcus spp. were the main species associated with BCO. Staphylococcus agnetis, which previously had not been isolated from poultry, was the principal species isolated from the majority of the bone lesion samples. Administering S. agnetis in the drinking water to broilers reared on wire flooring increased the incidence of BCO three-fold when compared with broilers drinking tap water (P = 0.001). We found that the minimum effective dose of Staphylococcus agnetis to induce BCO in broilers grown on wire flooring experiment is 105 cfu/ml. We used PacBio and Illumina sequencing to assemble a 2.4 Mbp contig representing the genome and a 34 kbp contig for the largest plasmid of S. agnetis. Annotation of this genome is underway through comparative genomics with other Staphylococcus genomes, and identification of virulence factors. Our goal is to elucidate genetic diversity, toxins, and pathogenicity determinants, for this poorly characterized species. Isolating pathogenic bacterial species, defining their likely route of transmission to broilers, and genomic analyses will contribute substantially to the development of measures for mitigating BCO losses in poultry.


June 1, 2021  |  

Comparative genome analysis of Clavibacter michiganensis subsp. michiganensis strains provides insights into genetic diversity and virulence.

Clavibacter michiganensis subsp. michiganensis (Cmm) is a gram positive actinomycete, causing bacterial canker of tomato (Solanum lycopersicum) a disease that can cause significant losses in tomato production. In this study, we determined the complete genome sequence of 13 California Cmm strains and one saprophytic Clavibacter strain using a combination of Ilumina and PacBio sequencing. The California Cmm strains have genome size (3.2 -3.3 mb) similar to the reference strain NCPPB382 (3.3 mb) with =98% sequence identity. Cmm strains from California share =92% genes (8-10% are noble genes) with the reference Cmm strain NCPPB382. Despite this similarity, we detected significant alternatives in California strains with respect to plasmid number, plasmid composition, and genomic island presence indicating acquisition of unique mechanisms controlling virulence. Plasmids pCM1 and pCM2, that were previously demonstrated to be required for NCPPB382 virulence, also differ in their presence and gene content across Cmm strains. pCM2 is absent in some Cmm strains and that still retain virulence in tomato. Saprophytic Clavibacter possess a novel plasmid, pSCM, and lacks the majority of characterized virulence factors. Genome sequence information was also used to design specific and sensitive primer pairs for Cmm detection. A mechanistic understanding of how genomic changes have impacted Cmm virulence and survival across diverse strains will be necessary for developing a robust disease control strategies for bacterial canker of tomato.


June 1, 2021  |  

A workflow for the analysis of contigs from the metagenomic shotgun assembly of SMRT Sequencing data

The throughput of SMRT Sequencing and long reads allows microbial communities to be analyzed using a shotgun sequencing approach. Key to leveraging this data is the ability to cluster sequences belonging to the same member of a community. Long reads of up to 40 kb provide a unique capability in identifying those relationships, and pave the way towards finished assemblies of community members. Long reads are highly valuable when samples are more complex and containing lower intra-species variation, such as a larger number of closely related species, or high intra-species variation. Here, we present a collection of tools tailored for the analysis of PacBio metagenomic assemblies. These tools allow for improvements in the assembly results, and greater insight into the complexity of the study communities. Supervised classification is applied to a large set of sequence characteristics (e.g. GC content, raw read coverage, k-mer frequency, and gene prediction information) and to cluster contigs from single or highly related species. Assembly in isolation of the raw data associated with these contigs is shown to improve assembly statistics. A unique feature of SMRT Sequencing is the availability to leverage simultaneously collected base modification / methylation data to aid the clustering of contigs expected to comprise a single or very closely related species. We demonstrate the added value of base modification information to distinguish and study variation within metagenomic samples based on differences in the methylated DNA motifs involved in the restriction modification system. Application of these techniques is demonstrated on a mock community and monkey intestinal microbiome sample.


June 1, 2021  |  

Complete microbial genomes, epigenomes, and transcriptomes using long-read PacBio Sequencing.

For comprehensive metabolic reconstructions and a resulting understanding of the pathways leading to natural products, it is desirable to obtain complete information about the genetic blueprint of the organisms used. Traditional Sanger and next-generation, short-read sequencing technologies have shortcomings with respect to read lengths and DNA-sequence context bias, leading to fragmented and incomplete genome information. The development of long-read, single molecule, real-time (SMRT) DNA sequencing from Pacific Biosciences, with >10,000 bp average read lengths and a lack of sequence context bias, now allows for the generation of complete genomes in a fully automated workflow. In addition to the genome sequence, DNA methylation is characterized in the process of sequencing. PacBio® sequencing has also been applied to microbial transcriptomes. Long reads enable sequencing of full-length cDNAs allowing for identification of complete gene and operon sequences without the need for transcript assembly. We will highlight several examples where these capabilities have been leveraged in the areas of industrial microbiology, including biocommodities, biofuels, bioremediation, new bacteria with potential commercial applications, antibiotic discovery, and livestock/plant microbiome interactions.


June 1, 2021  |  

Low-input long-read sequencing for complete microbial genomes and metagenomic community analysis.

Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. A 2 kb SMRTbell library only requires a few ng of gDNA when carrier DNA is added to the library. The resulting libraries can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base-modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for the analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99% accuracy can be obtained from Circular Consensus Sequencing.


June 1, 2021  |  

Metagenomes of native and electrode-enriched microbial communities from the Soudan Iron Mine.

Despite apparent carbon limitation, anoxic deep subsurface brines at the Soudan Underground Iron Mine harbor active microbial communities. To characterize these assemblages, we performed shotgun metagenomics of native and enriched samples. Following enrichment on poised electrodes and long read sequencing, we recovered from the metagenome the closed, circular genome of a novel Desulfuromonas sp. with remarkable genomic features that were not fully resolved by short read assembly alone. This organism was essentially absent in unenriched Soudan communities, indicating that electrodes are highly selective for putative metal reducers. Native community metagenomes suggest that carbon cycling is driven by methyl-C1 metabolism, in particular methylotrophic methanogenesis. Our results highlight the promising potential for long reads in metagenomic surveys of low-diversity environments.


June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing.

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments generally use short-read, second-generation sequencing, which results in data processing difficulties. For example, reads less than 1 kb in length will likely not cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT) Sequencing reads in the 1-2 kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells would generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community, since SMRT Sequencing has been shown to have no sequence-context bias. Long read lengths mean that that it would be reasonable to expect a high number of the reads to include gene fragments useful for analysis.


June 1, 2021  |  

Analysis of full-length metagenomic 16S genes by Single Molecule, Real-Time Sequencing

High-throughput sequencing of the complete 16S rRNA gene has become a valuable tool for characterizing microbial communities. However, the short reads produced by second-generation sequencing cannot provide taxonomic classification below the genus level. In this study, we demonstrate the capability of PacBio’s Single Molecule, Real-Time (SMRT) Sequencing to generate community profiles using mock microbial community samples from BEI Resources. We also evaluate multiplexing capabilities using PacBio barcodes on pooled samples comprising heterogeneous 16S amplicon populations representing soil, fecal, and mock communities.


June 1, 2021  |  

Making the most of long reads: towards efficient assemblers for reference quality, de novo reconstructions

2015 SMRT Informatics Developers Conference Presentation Slides: Gene Myers, Ph.D., Founding Director, Systems Biology Center, Max Planck Institute delivered the keynote presentation. He talked about building efficient assemblers, the importance of random error distribution in sequencing data, and resolving tricky repeats with very long reads. He also encouraged developers to release assembly modules openly, and noted that data should be straightforward to parse since sharing data interfaces is easier than sharing software interfaces.


June 1, 2021  |  

Profiling metagenomic communities using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities, spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR amplification. Whole-sample shotgun experiments require a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-2 kb range, with >99% consensus accuracy, can be efficiently generated for low amounts of input DNA, e.g. as little as 10 ng of input DNA sequenced in 4 SMRT Cells can generate >100,000 such reads. While throughput is low compared to second-generation sequencing, the reads are a true random sampling of the underlying community. Long read lengths translate to a high number of the reads harboring full genes or even full operons for downstream analysis. Here we present the results of circular-consensus sequencing on a mock metagenomic community with an abundance range of multiple orders of magnitude, and compare the results with both 16S and shotgun assembly methods. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows to elucidate meaningful information from the very low-abundance community members. For example, given the above low-input sequencing approach, a community member at 1/1,000 relative abundance would generate 100 1-2 kb sequence fragments having 99% consensus accuracy, with a high probability of containing a gene fragment useful for taxonomic classification or functional insight.


June 1, 2021  |  

Diploid genome assembly and comprehensive haplotype sequence reconstruction

Outside of the simplest cases (haploid, bacteria, or inbreds), genomic information is not carried in a single reference per individual, but rather has higher ploidy (n=>2) for almost all organisms. The existence of two or more highly related sequences within an individual makes it extremely difficult to build high quality, highly contiguous genome assemblies from short DNA fragments. Based on the earlier work on a polyploidy aware assembler, FALCON ( https://github.com/PacificBiosciences/FALCON) , we developed new algorithms and software (“FALCON-unzip”) for de novo haplotype reconstructions from SMRT Sequencing data. We generate two datasets for developing the algorithms and the prototype software: (1) whole genome sequencing data from a highly repetitive diploid fungal (Clavicorona pyxidata) and (2) whole genome sequencing data from an F1 hybrid from two inbred Arabidopsis strains: Cvi-0 and Col-0. For the fungal genome, we achieved an N50 of 1.53 Mb (of the 1n assembly contigs) of the ~42 Mb 1n genome and an N50 of the haplotigs (haplotype specific contigs) of 872 kb from a 95X read length N50 ~16 kb dataset. We found that ~ 45% of the genome was highly heterozygous and ~55% of the genome was highly homozygous. We developed methods to assess the base-level accuracy and local haplotype phasing accuracy of the assembly with short-read data from the Illumina® platform. For the ArabidopsisF1 hybrid genome, we found that 80% of the genome could be separated into haplotigs. The long range accuracy of phasing haplotigs was evaluated by comparing them to the assemblies from the two inbred parental lines. We show that a more complete view of all haplotypes could provide useful biological insights through improved annotation, characterization of heterozygous variants of all sizes, and resolution of differential allele expression. The current Falcon-Unzip method will lead to understand how to solve more difficult polyploid genome assembly problems and improve the computational efficiency for large genome assemblies. Based on this work, we can develop a pipeline enabling routinely assemble diploid or polyploid genomes as haplotigs, representing a comprehensive view of the genomes that can be studied with the information at hand.


June 1, 2021  |  

Profiling the microbiome in fecal microbiota transplantation using circular consensus and Single Molecule, Real-Time Sequencing

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT®) Sequencing reads in the 1-3kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells on the PacBio RS II would generate >100,000 such reads. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With reads >1 kb at >99% accuracy it is reasonable to expect a high percentage of reads include gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.