UPDATE: Our R&D team has added a new dataset for the MCF-7 human breast cancer transcriptome, originally released in 2013. The new results were produced using 28 SMRT® Cells with 4-hour movies and P5-C3 chemistry. Sizing was performed with the SageELF™ platform (fractions collected: 1-2 kb, 2-3 kb, 3-5 kb, and 5-10 kb). Sequencing of the larger fractions with our newer sequencing chemistry that generates longer reads added longer transcripts (up to 10 kb) to the MCF-7 dataset, which previously had only transcripts up to 4 kb. New FASTA and GFF files are available, representing the new combined dataset. Raw…
A paper just released in BMC Genomics details what authors call “the most complete filarial nematode assembly published thus far at a fraction of the cost of previous efforts.” The project was performed using the PacBio® RS II DNA Sequencing System by scientists at the University of Maryland School of Medicine’s Institute for Genome Sciences and the Laboratory of Parasitic Diseases at the National Institute of Allergy and Infectious Diseases. In this genome sequencing effort, scientists generated a de novo assembly of Loa loa, a roundworm that infects humans. L. loa, transmitted to humans by deer flies, causes loiasis. The…
We are pleased to make publicly available a new shotgun sequence dataset of long PacBio® reads from a human DNA sample. We previously released sequence data using Single Molecule, Real-Time (SMRT®) Sequencing of ~10x coverage of this sample, sufficient for reference-based detection of structural variation. Today we expand on that release with additional data that increases the total sequencing coverage to ~54x. This long-read data has enabled the generation of the first de novo human genome assembly from PacBio-only sequence reads. Download the 54x long-read coverage dataset. The dataset was generated from sequencing a well-studied human cell line (CHM1htert), which…
A new Genome Research paper describes the application of Single Molecule, Real-Time (SMRT®) Sequencing to resolve repeat-heavy genomic regions in important reference genomes such as human and chimpanzee. In the process, the authors drew some important conclusions about cost, pooling, and coverage requirements for this type of work. “Reconstructing complex regions of genomes using long-read sequencing technology” comes from lead author John Huddleston and senior author Evan Eichler at the University of Washington, along with collaborators at Washington University, the University of Bari, Bilkent University, and Pacific Biosciences. In the paper, Eichler and his collaborators note the steep cost of…
By Jonas Korlach, Chief Scientific Officer 2013 was an eventful and exciting year for PacBio. As I described in the 2013 roadmap post a year ago, we have applied numerous improvements to SMRT® Sequencing, resulting in longer read lengths, greater sequencing throughput, new and improved data-analysis methods, and more efficient workflows. We are very pleased that these advances resulted in so many publications, conference presentations, and social media contributions, with the number of peer-reviewed scientific publications from the scientific community now exceeding 100. On behalf of all of us at Pacific Biosciences, I would like to express my heartfelt gratitude…
Update 1/13/14: A new data release of Arabidopsis using P5-C3 chemistry is available Advances in our chemistries, throughput, and read length are pushing the envelope in the way we tackle larger genomes. We recently sequenced the Landsberg erecta ecotype (Ler-0) of Arabidopsis thaliana and produced a successful assembly solely using PacBio® data. The data set resulting from this sequencing effort and assembly using SMRT® Portal is now available via Devnet for anyone who wants to give it a test drive. A few stats on Arabidopsis and the assembly using PacBio sequence data: Genome size: 124.6 Mb GC content: 33.92% Raw…
As PacBio customers are upgrading to the new PacBio® RS II System, some of our core lab users have already begun blogging about the improved results. At the University of Maryland’s Institute for Genome Sciences (IGS), for example, one blogger posted data comparing read length, read count, and throughput for the PacBio RS and PacBio RS II. The post reports a comparison of an 8 Kb Mycobacterium project run on the PacBio RS and again on the PacBio RS II, finding that with the upgrade, “we see an almost 3x increase in total yield [per SMRT Cell], while read lengths…
A paper recently published in Nature Methods offers a deep dive into the use of our HGAP and Quiver tools to generate a high-quality genome assembly with an automated, simplified workflow. (“Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data,” Chin et al., advance online publication.) The publication, which includes lead author Chen-Shan Chin and others at Pacific Biosciences as well as collaborators at the Joint Genome Institute and the Eichler lab at the University of Washington, uses Single Molecule, Real-Time (SMRT®) Sequencing on three microorganisms and one human BAC to compare PacBio-only sequencing to existing high-quality reference genomes.…
A newly reported Salmonella genome showcases the utility of single molecule, real-time (SMRT®) sequencing for characterizing a foodborne outbreak pathogen. The outbreak strain, Salmonella enterica subsp. enterica serovar Javiana (S. Javiana), representing one of the top five most common forms of Salmonella associated with fresh-cut produce, was sequenced and analyzed late last year; its genome was published this month in Genome Announcements, a journal from the American Society for Microbiology. The study was led by the US Food & Drug Administration’s Center for Food Safety & Applied Nutrition. Scientists from Pacific Biosciences and New England BioLabs participated in the study,…
Even as attendees’ energy was waning from three marathon days at AGBT, spirits were still high as we gathered for the final day’s Genomic Technologies session on Saturday morning. This session included two speakers presenting on applications using SMRT Sequencing: Eric Antoniou from Cold Spring Harbor Laboratory and Jonas Korlach, our CSO. Antoniou, a research investigator and manager of the genome sequencing center at Cold Spring Harbor Laboratory, presented on “Increased Read Length and Sequence Quality with Pacific Biosciences Magbead Loading System and a New DNA Polymerase.” In it, he reported on the sequencing and assembly of the 470-Mb rice…
Plus: Accuracy Boost, Integrated Full-Length cDNA Analysis and Barcoding Support We’re pleased to announce the release of a new software upgrade — SMRT® Analysis v1.4.0 — that achieves higher quality genome assemblies with near-perfect base-level accuracy. You can read documentation, check out data, and download the new software from DevNet. SMRT Analysis v1.4.0 includes a new hierarchical de novo genome assembly process (HGAP), which allows researchers to assemble entire microbial and fungal genomes using just PacBio® long reads. As a result, users can generate better assemblies with a single library preparation and fewer SMRT Cells than previous approaches that also required short-read sequencing…
By Jonas Korlach, Chief Scientific Officer Single Molecule, Real-Time (SMRT®) DNA sequencing achieves highly accurate sequencing results, exceeding 99.999% (Q50) accuracy, regardless of the DNA’s sequence context or GC content. This is possible because SMRT sequencing excels in all three categories that are relevant when considering accuracy in DNA sequencing: 1. Consensus accuracy 2. Sequence context bias 3. Mappability of sequence reads Since there has been some confusion in the community about accuracy in SMRT sequencing, I would like to describe how our system performs and how such high accuracy is achieved. You can download the full perspective, complete with…