Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.


Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.


You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences’ rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences

International Team Publishes Comprehensive DNA Analysis of German E. Coli Outbreak Strain and 11 Related Strains in New England Journal of Medicine

Thursday, July 28, 2011

Yesterday, we published a paper on the origin of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany, in the New England Journal of Medicine, based on the sequencing data we recently deposited in the public domain on different isolates of E. coli, including the current German outbreak as well as 7 additional E. coli strains from different outbreaks (in Africa) but with the same serotype (O104:H4) as the German outbreak strain. To pull this off, we teamed up with leaders in the infectious disease space at Harvard/HHMI (Matt Waldor), UMD IGS (Dave Rasko and his lab), SSI (Karen Angeliki Krogfelt and Flemming Scheutz and their labs), and UVA School of Medicine (Jim Nataro). While there were a number of other groups who had secured access to samples and sequenced isolates from the German outbreak, providing access to those data near the beginning of June, our data and analyses nicely complement this previous work. The combined efforts now provide a more accurate view of the origins and hypotheses relating to increased virulence and drug resistance of this outbreak strain.

In our paper, we described the use of very long read sequencing data generated on the PacBio RS to provide not only the first ever PacBio-only de novo assembly, but the most comprehensive characterization of the genome of the outbreak strain published to date. To produce the most accurate interpretations regarding an outbreak such as this, the gold standard is a complete (or near complete) genome assembly. We achieved this in the present study by combining the really long reads with accurate circular consensus sequencing data, a strategy that yielded long contigs covering 99.7% of the genome. This assembly of the outbreak strain, combined with the sequence data for the other 11 strains we sequenced, provided a more informative context for unambiguously resolving structural variations (in addition to single nucleotide variations) that we demonstrated were important not only for understanding virulence and drug resistance, but for understanding the relationships between enteroaggregative E. coli (EAEC) strains, EAEC strains of the O104:H4 serotype, enterohemorrhagic E. coli (EHEC) strains, as well as a number of other E. coli strains. Our analyses highlighted a number of interesting genomic features (including larger-scale deletions, insertions, and inversions) that were both shared among O104:H4 strains as well as specific to the German outbreak strain. This information combined with the phylogenetic analysis comparing 53 E. coli strains (Figure 2 in the paper), based on whole genome information (not just SNPs), enabled us to unambiguously classify the outbreak strain as an EAEC strain, with horizontal genetic exchange with the Shiga toxin-producing EHEC strain most likely explaining the emergence of this highly virulent Shiga toxin-producing O104:H4 EAEC outbreak strain. Further, consistent with our predictions based on the comparative analyses, we discovered that the Shiga toxin-producing gene (stx2) was significantly increased by certain antibiotics, including ciprofloxacin, suggesting that caution should be exercised in considering treating infections resulting from this strain with certain antibiotics.

My favorite part of the paper is Figure 1, which used Circos (a really wonderful tool for visualizing large-scale, high-dimensional data; see Genome Res (2009) 19:1639-1645) to visualize the results of a comparative analysis that included all of the data we generated on the O104:H4 strains. This figure is packed with nuggets of information that quickly reveal regions of the genome of interest that are most likely involved in the increased virulence of the outbreak strain. In fact, the NEJM staff put together a really cool animation that describes how the Circos plot was constructed, something I think goes even further in enabling those less genomic savvy researchers/medical professionals to understand what types of analyses were carried out to produce that figure. In addition, the PacBio team has all of the relevant *raw* data related to the highlights NEJM provided via the animation, available through our genome browsing tool (SMRT™ View) providing the ability to directly examine the very long reads and how they elucidate structural variations in the genome (I think very cool if you have time to take a look).

One thing we weren’t able to highlight in the paper given length constraints was the utility of the long reads beyond assembly. For example, when we examined single long raw reads (> 5kb) and simply BLASTed them against the NCBI nucleotide database, the top hits that came back most often included E. coli strain 55989. That is, with single reads we are able to classify the outbreak strain! (shown below for NCBI BLAST hits of 9k+ bps reads)

While perhaps unnecessary in this context given in the time it took to generate the really long reads we had enough data to begin whole genome assembly, it demonstrates perhaps how in more complex communities of bacteria (metagenomics projects) that the long reads will be useful for resolving at very low coverage the makeup of the community. The single long reads can also cover whole genes, virulence factors, structural variations, provide for the exact positioning of mobile elements spanned by the long reads, again demonstrating utility beyond de novo assembly.

Finally, I thought worth commenting on the worldwide effort that was undertaken to sequence the outbreak strain (of which we were just one part), well illustrating how the emerging high-throughput DNA sequencing technologies will transform the infectious disease space and beyond. In fact, we had demonstrated this power late last year with rapid sequencing of the Haitian cholera outbreak strain, setting the stage for a new kind of molecular epidemiology based on whole genome sequencing carried out in a matter of hours. The rapid release of data by BGI and HPA have definitely set a new type of standard for enabling analyses of outbreaks by communities of researchers to proceed with great speed, while at the same time these expert groups generating the data carry out more definitive, peer reviewed analyses (although perhaps we are closer than we think to rapid analyses being published on the web and communities of researchers providing rapid “reviews” that result in updates and refinements, and ultimately community consensus regarding interpretations of the data). While I wish we would have pushed to get samples from this German E. coli outbreak earlier (alas, there is only so much we can do in a day!), the data we ultimately provided demonstrated that it is indeed possible to get to a near complete genome in a de novo fashion using sequence data generated in a matter of hours. Exciting times ahead!!

Subscribe for blog updates: