Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.


Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.


You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences’ rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences

PacBio blog

This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.

Search PacBio’s Blog

Wednesday, May 4, 2016

Eco-friendly Soil Remediation Gets a Boost with the Latest SMRT Grant Program Winner

Congratulations to the winner of the first-ever SMRT Grant program decided by the community: Renying Zhuo of the Chinese Academy of Forestry!

We ran polling through our Genome Galaxy Initiative on the Experiment crowdfunding platform and were amazed to see much how it galvanized the genomics community. There were 30,000+ responses to the competition across the five finalists for our “Explore Your Most Interesting Genome” grant opportunity. Zhuo garnered the most support for his project to sequence two sister species of Sedum alfredii for a comparative genomics investigation to identify key genes important for remediating soil contaminated by heavy metals. This project could be applied to address cadmium ion pollution, a growing concern within rapidly industrializing nations.

The remaining finalists have a second chance to earn funding by launching their projects through a special crowdfunding event. The runners-up will kick off their campaigns through Experiment, and we sincerely hope that supporters of these worthy projects come back to help. Every donation makes a difference! And, donations are only accepted when projects meet their crowdfunding goal. Here’s a quick look at the projects now open for contributions:

The Amazing & Enigmatic Alpaca

Investigator: Kylie Munyard, Curtin University

According to Munyard’s proposal, the economically important alpacas are of great scientific interest on a number of fronts, and producing a reference genome will enable new studies in both agricultural and biomedical research. Alpacas are a good model for diabetes research; they have innate mechanisms to stay free of parasites; their distant relationship to other agricultural animals makes them good for comparative study; and much more.

Sequencing an Extremophile Earthworm

Investigator: Luis Cunha, Cardiff University

This project would sequence the extraordinary earthworm Pontoscolex corethrurus, which lives in a volcanic geothermal field with high exposure to toxic gases, extreme temperatures, and very little oxygen. Cunha’s proposal notes that preliminary work with draft assemblies indicates significant levels of horizontal gene transfer that could be better characterized with SMRT Sequencing.

Scar-Free Regeneration in the Spiny Mouse

Investigator: William Barbazuk, University of Florida

According to this proposal, the adult spiny mouse is the only known mammal with the unique ability to regenerate skin and organs after wounds without any scars or other indications of trauma, making this organism interesting for regenerative medicine. Barbazuk hypothesizes novel genes, alternatively-spliced isoforms, and gene expression regulators are responsible. He aims to use SMRT Sequencing to study the transcriptome of spiny mouse and its wound-healing properties.

Highlighting Firefly: A Genome Resource

Investigator: Jing-Ke Weng, MIT

This project would help a large consortium of researchers generate a high-quality genome assembly for Photinus pyralis, an American firefly. Weng’s proposal notes that the 2,000-plus species of these charismatic flashing beetles have been understudied, and that the biological mechanisms behind important traits such as bioluminescence remain unknown.

The PacBio team is very grateful to the scientific community and their supporters, as well as our co-sponsors for this grant program: Sage Science, Computomics, RTL Genomics, Texas A&M AgriLife Genomics and Bioinformatics Service, and Experiment.

Read More »

Monday, May 2, 2016

Join the SMRT Community: User Meetings in Europe, Asia, and the US

There are several PacBio user meetings coming up, and with locations around the world we hope you’ll be able to attend one of them. These meetings are a great way to meet fellow customers, exchange tips, and learn about new applications. If you are interested, please register as soon as possible to reserve your seat.


The Netherlands: SMRT Leiden Symposium & Informatics Developers Meeting, June 6-8

This meeting, organized and hosted by Leiden University Medical Center’s human genetics department, includes a scientific symposium and the first SMRT Informatics Developers meeting in Europe. There’s an impressive agenda for the two-day symposium, with topics covering genomics, transcriptomics, and epigenomics of organisms ranging from microbes to humans. Keynote speakers include Evan Eichler, Steven Marsh, Shinichi Morishita, and Hagen Tilgner. On June 8, the informatics conference will kick off with a keynote talk from Gene Myers and provide plenty of opportunity for brainstorming about approaches to de novo assembly, structural variant detection, genome phasing, the Iso-Seq method, and more. Registration is free.


Maryland: East Coast User Group Meeting & Workshops, June 7-9

PacBio’s fourth annual East Coast User Group Meeting will be hosted by the University of Maryland’s Institute for Genome Sciences. In addition to the day-long meeting on June 8, attendees may also participate in half-day workshops on sample prep (June 7) and bioinformatics (June 9). The sample prep workshop will cover best practices and basic data analysis, with group breakout discussions for deeper dives into specific topics of interest. The bioinformatics event will introduce SMRT Analysis 3.x, including the SMRT Link GUI and command line examples. Data management for both the PacBio RS II and the new Sequel System will be discussed. Reserve your seat now.


Singapore: Asia User Group Meeting & Workshop, June 8-10

This event, held at the Grand Copthorne Waterfront Hotel, will feature two days of presentations for the user group meeting and an additional day-long bioinformatics workshop. The general meeting will include topics from de novo genome assembly and targeted sequencing to Iso-Seq full-length RNA sequencing and epigenomics, covering microbes, plants, animals, and humans. The bioinformatics workshop will focus on genome assembly and quality control, targeted sequencing and phasing, detection of minor variants or structural variants, and more. Learn more and sign up for the meeting.

Read More »

Wednesday, April 27, 2016

Upcoming Webinars on Biomedical Research, Data Analysis, and Structural Variation

We’ve got several educational webinars coming up, and we hope you can join us!

Our first event will be hosted by Front Line Genomics on April 28 (4:00 p.m. BST / 11:00 a.m. EST / 8:00 a.m. PST). “Applying PacBio Long-Read Sequencing for Human Biomedical Research” will include Adam Ameur of the National Genomics Infrastructure in Sweden; Giancarlo Russo from the Functional Genomics Center Zurich; and our CSO Jonas Korlach. Each participant will offer a brief presentation, with audience Q&A at the end.

dnanexus logoWe’ve also teamed up with DNAnexus to offer two webinars on best practices for SMRT Sequencing data analysis. The first, on May 4 (5:00 p.m. CET / 11:00 a.m. EST / 8:00 a.m. PST), features DNAnexus Computational Biology Project Leader Brett Hannigan discussing rapid assembly for reference-quality genomes. The webinar will include a look at the challenges involved in assembling the 4.5 Gb tobacco genome and walk through running the FALCON assembler on the DNAnexus platform.

The second webinar, focused on discovering structural variants in SMRT Sequencing data, will take place on June 16 (5:00 p.m. CET / 11:00 a.m. EST / 8:00 a.m. PST). Andrew Carroll, Director of Science at DNAnexus, will talk about using cloud-optimized apps such as PBHoney, Parliament, and Sniffles to improve the accuracy of calling structural variation.

All webinars will also be recorded. If you cannot attend in person, please sign up and we will send you the file following the event.

Read More »

Monday, April 25, 2016

On DNA Day Honoring Discoveries – Y chromosome, Reference Grade De Novo Assemblies & Methylation

98074_thumbHappy DNA Day, everyone! This scientific celebration has us reflecting on the many advancements the community has made in the past year. For a molecule that is sequenced thousands of times a day all over the world, there is still much to learn. Today we’d like to honor some of the remarkable science enabled by SMRT Sequencing since last year’s DNA Day.


Scientists have continued to make progress exploring regions of the genome that have long been considered intractable. Two of our favorite stories this year came from the always-challenging Y chromosome. Researchers studying the mosquitoes that carry malaria — Anopheles gambiae — delivered the first detailed analysis of their Y chromosome, which is essentially a giant string of repeat sequences. The information may prove essential for efforts to shift the sex ratio of mosquito populations toward males, which do not transmit disease. In a separate study, scientists analyzed the Y chromosome in Drosophila and found evidence of an ancient gene duplication from an autosome; the gene had since acquired a new function on the Y chromosome. The gene had never been discovered before because of its location in a highly repetitive, complex genomic region that was inaccessible to other sequencers.


We’ve also seen a number of great examples of reference-grade de novo genome assemblies in the past year. A large team of scientists produced what they called “the most contiguous clone-free human genome assembly to date” using SMRT Sequencing along with single-molecule genome maps from BioNano Genomics. A similar strategy was used to generate “a gapless telomere-to-telomere genome assembly” of the filamentous fungus Verticillium dahliae, according to the publication in mBio. Just recently researchers published a new assembly for the gorilla genome, representing better than 150-fold improvement over the previous assembly. We loved the story of Oropetium thomaeum, a resurrection grass that was sequenced for one of our SMRT Grant winners and resulted in a virtually complete assembly.


DNA methylation has been another area of interesting developments as scientists delve into this poorly understood genetic mechanism. A recent Joint Genome Institute project involved a sweeping analysis of 230 prokaryotes that revealed more methylation, and more complex patterns, than ever suspected. A separate study detected methylation for the first time in C. elegans, proving that even well-characterized organisms still have secrets to reveal. Scientists also made progress in understanding the role of epigenetics in virulence and antibiotic resistance; this study found an epigenetic switch in non-typeable Haemophilus influenza that alters the organism’s pathogenicity and drug resistance.


The past year has also been an exciting time for the PacBio team. In October, we launched our new Sequel System, a sequencer one-third the size and half the cost of the PacBio RS II with nearly seven-fold higher throughput. And right now we’re gathering votes for our first-ever community poll to award a new SMRT Grant. If you haven’t voted yet, now’s the time!

Read More »

Wednesday, April 20, 2016

Benchmarking Study:
Full-Length 16S Sequencing Offers Better Phylogenetic Resolution

Scientists from the Joint Genome Institute and other institutions recently reported a new SMRT Sequencing approach to microbial profiling using full-length sequencing of the 16S rRNA gene. In a benchmarking study, they demonstrate that this method allows for more accurate taxonomic classification than is possible with typical short-read sequencing methods.

Lead author Esther Singer, senior author Tanja Woyke, and collaborators at USDA-ARS, the University of British Columbia, and other research groups published “High-resolution phylogenetic microbial community profiling” in The ISME Journal earlier this year. The scientists note that while 16S phylogenetic analysis has traditionally been performed with gold-quality Sanger sequencing, the need for a more cost-effective solution drove the field to short-read sequencing technologies, which have produced most of the 16S sequences in GenBank. However, that shift came at the cost of quality. “Reference sequences with low read accuracy, chimeric sequences and partial rRNA gene sequences with reduced phylogenetic resolution generated on short-read sequencing platforms such as 454 and Illumina remain problematic, resulting in incorrect or less accurate classification of environmental sequences,” the authors report.

The team thought long reads from SMRT Sequencing could provide an appealing alternative. In this project, they generated full-length 16S sequences from microbial communities using a PacBio instrument and compared results to those from a short-read platform. They first tested the approach on a mock community of 26 bacterial and archaeal species including E. coli and strains of Salmonella and Clostridium, generating full-length 16S sequences called PhyloTags in a successful validation of the method.

Next they went to the field, using PacBio and short-read sequencing to analyze microbial communities from a lake in British Columbia, with water samples taken at eight different depths. They determined that partial sequences from the 16S gene — the information generated by sequencers that can’t cover the full gene in a single read — were less likely to resolve phylogeny and were more likely to lead to incorrect matches, particularly in more complex microbial communities. As many as 4% of short-read results “were taxonomically unresolved at the phylum level, whereas all PhyloTags were classified into distinct bacterial phyla,” the scientists report. In an analysis of unclustered sequence data, they note that short-read sequence results were “more often either impossible or incorrect, significantly altering community profiles across all taxonomic levels.” They also found that certain phyla were more likely to be misclassified when only partial gene coverage was available. “PhyloTag sequencing … offers the highest contig accuracy without discrimination against GC-rich or -poor regions, which further reduces bias in amplicon-based profiling,” the authors write.

“A resurgence of [full-length] sequences used as ‘gold standards’ has the potential to yet again transform microbial community studies, increasing the accuracy of taxonomic assignments for known and novel branches in the tree of life on previously unobtainable scales,” Singer et al. report.

Read More »

Monday, April 18, 2016

First Comprehensive Analysis of Mosquito Y Chromosome Offers Clues for Vector Control

pnasA new PNAS paper offers the first detailed analysis of the Anopheles gambiae Y chromosome, which could prove critical for biological and infectious disease research. The report uncovered extensive remodeling of the Y chromosome, which consists almost entirely of highly repetitive sequence. The authors say this study “provides a long-awaited foundation for studying male mosquito biology, and will inform novel mosquito control strategies based on the manipulation of Y chromosomes.”

Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes” comes from lead authors Andrew Brantley Hall, Philippos-Aris Papathanos, Atashi Sharma, and Changde Cheng, along with senior author Nora Besansky. This large collaboration combines scientific expertise from Virginia Polytechnic Institute and State University, the University of Notre Dame, NHGRI, Indiana University, and several other institutes. The project was formed to address the lack of information about the mosquito Y chromosome, which has hindered vector-control efforts. Previous sequencing initiatives had analyzed A. gambiae but reported only 180 kb of unordered sequence data for the Y chromosome; a related mosquito genome project revealed 57 short sequences, while some 200 kb of sequence data was generated from BAC clones for the chromosome.

The challenge lies in the highly repetitive sequences found in Y chromosomes across organisms. “The Y chromosome remains one of the most recalcitrant and poorly characterized portions of any genome more than a decade into the postgenomic era,” the authors write. Among mosquitoes, the Y chromosome is particularly interesting because males do not transmit disease, so shifting the sex ratio of populations is a promising vector-control approach for reducing the incidence of malaria, Zika virus, and other mosquito-borne disease.

The team used SMRT Sequencing to tackle the A. gambiae Y chromosome, first generating a 294 Mb de novo assembly followed by sequencing and completely assembling BAC clones. “We find that the A. gambiae Y consists almost entirely of a few massively amplified, tandemly arrayed repeats, some of which can recombine with similar repeats on the X chromosome,” the scientists report.

For further analysis, the scientists incorporated genome resequencing data from a recent species radiation and determined that the Y chromosome experiences rapid sequence turnover. They also used RNA-seq data to identify a small number of genes on the chromosome that had no homologs on the X chromosome. In addition, they found YG2, a conserved gene that may have a role in sex determination in the mosquito.

The authors note that SMRT Sequencing has been a game-changing development for analyzing Y chromosomes in many organisms, “promising a resource-efficient alternative” to the laborious processes used in the past. “Single-molecule sequencing reads were able to reveal complex repeat structures from whole-genome data and completely assemble heterochromatic BACs without manual finishing,” the scientists conclude. “These results suggest that continued single-molecule read length and throughput improvements may soon enable the complete reconstruction of Y chromosomes from whole-genome data alone.”

Read More »

Friday, April 15, 2016

Japanese Scientists Find Gene Fusion Driving B Cell Leukemia

In a new Nature Genetics paper, scientists from the University of Tokyo and several other Japanese institutes and hospitals present results of a sweeping study of gene fusions driving a form of leukemia in teenagers and young adults. They used SMRT Sequencing to validate the gene fusion.

Recurrent DUX4 fusions in B cell acute lymphoblastic leukemia of adolescents and young adults” comes from lead author Takahiko Yasuda and senior author Hiroyuki Mano, along with many collaborators. The team embarked on the search for new oncogenes responsible for acute lymphoblastic leukemia (ALL) in subjects from 15 to 39 years of age because the mechanisms responsible for this cancer “remain largely elusive,” they write.

From a large RNA-seq analysis, they found frequent insertion of a D4Z4 repeat that includes the DUX4 gene into the IGH locus, creating a DUX4-IGH gene fusion that produces high expression levels of an aberrant form of DUX4. The scientists transplanted this gene fusion into mice, where it led to the generation of B cell leukemia. They report that fusion-driven oncogenes are more important for causing ALL in this age range than previously thought. “Our data thus show that DUX4 can become an oncogenic driver as a result of somatic chromosomal rearrangements and that [ALL in adolescents and young adults] may be a clinical entity distinct from ALL at other ages,” Yasuda et al. write.

The team used SMRT Sequencing to confirm the full sequence of the gene fusion, which could not be done with short-read sequencing. “Given that the average read length in our next-generation sequencing approach was 104 bp, it was difficult to determine how many copies of DUX4 had been inserted into the IGH locus,” they report. They performed whole genome sequencing of a B cell line cultured from a 19-year-old ALL patient, generating about 69 Gb of data.

“Our analysis confirmed that one full-length and one partial copy of D4Z4 were translocated to the IGH locus, accompanied by minor rearrangements within the IGHD2-15 and IGHVII-60-1 regions,” the scientists report. This figure shows SMRT Sequencing data confirming the presence of the DUX4-IGH gene fusion.

To learn more about applying SMRT Sequencing to cancer research, check out our AACR conference preview.

Read More »

Thursday, April 14, 2016

Accelerating Cancer Research Discovery with SMRT Sequencing at AACR 2016

The PacBio team is gearing up for the annual meeting of the American Association for Cancer Research (AACR), which will be held April 16-20 in New Orleans. We’re looking forward to introducing the AACR community to the Sequel System, our new SMRT Sequencing platform that’s half the price and a third of the size of our PacBio RS II System. With 5-10 Gb of throughput per SMRT Cell, we think the Sequel System will be a great fit for the cancer research world.



Sunday, April 17, 4:15 – 6:15 p.m., Room 243, Morial Convention Center

MS.BSB01.01. Minisymposium: Novel and Integrative Analyses of Cancer Genome Data

Sun, Apr 17, 4:50 – 5:05 p.m., Room 243, Morial Convention Center
848 – Proteogenomic Analysis of Alternative Splicing: The Search for Novel Biomarkers for Colorectal Cancer

Malgorzata Komor, Netherlands Cancer Institute

Sun, Apr 17, 5:20 – 5:35 p.m., Room 243, Morial Convention Center
850 – Comprehensive Genome and Transcriptome Structural Analysis of a Breast Cancer Cell Line using Single Molecule Sequencing

Maria Nattestad, Cold Spring Harbor Laboratory (don’t miss our case study about this project!)


We’re also hosting four Meet the Expert sessions at our booth (#257) where attendees will hear quick presentations on targeted sequencing or the Iso-Seq method for transcriptome studies, followed by a Q&A. Our experts Roberto Lleras and Anand Sethuraman will be available on Monday the 18th and Tuesday the 19th.

  • Full-Length Isoform Sequencing: Monday 1:00 – 1:30 p.m.; Tuesday 2:30 – 3:00 p.m.
  • Targeted Sequencing with Long Reads: Monday 2:30 – 3:00 p.m.; Tuesday 1:00 – 1:30 p.m.

Finally, there will be a number of posters demonstrating the use of SMRT Sequencing for cancer research or analysis tools to help users make the most of their PacBio data. Here’s a quick peek at a few:



Sun, Apr 17, 1:00 – 5:00 p.m.
LB-012/12 – Autonomous, antigen-independent B-cell receptor signaling as a novel pathogenetic mechanism in non-GCB DLBCL
Marvyn T. Koning et al., Leiden University Medical Center

Tue, Apr 19, 8:00 a.m. – 12:00 p.m.

3438/12 – DNA methylation profiles of Helicobacter pylori strains from patients with gastric cancer and gastritis

Constanza Camargo et al., National Cancer Institute

Tue, Apr 19, 1:00 – 5:00 p.m.

3646/16 – Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing

Anand Sethuraman et al., PacBio

3611/10 – SMRT® Sequencing of DNA samples extracted from formalin-fixed and paraffin embedded tissues

Primo Baybayan et al., PacBio

LB-286/3 – Dynamic alternative splicing correlates with drug synergy and induces novel gene regulatory networks in MCF7
Xintong Chen et al., Icahn School of Medicine at Mount Sinai

Wed, Apr 20, 8:00 a.m. – 12:00 p.m.
5281/18 Fast and scalable software for comparative variant analysis and visualization of massive next-generation sequencing data
Riku Katainen et al., University of Helsinki


We look forward to seeing you in New Orleans!

Read More »

Wednesday, April 13, 2016

Genome and Transcriptome Analysis Help Scientists Deconstruct Cancer Complexity

At Cold Spring Harbor Laboratory, scientists used SMRT® Sequencing to decode one of the most challenging cancer genomes ever encountered. Along the way, they built a portfolio of open-access analysis tools that will help researchers everywhere make structural variation discoveries with long-read sequencing data.

When Mike Schatz realized a few years ago that his PacBio® System had reached the throughput needed to process human genomes, he decided to give it a real challenge: the incredibly complicated, massively rearranged SK-BR-3 breast cancer cell line. The genome consists of 80 chromosomes, and that’s just the tip of the complexity iceberg.

“We were really interested in sequencing a human genome that would be maximally impactful and that was aligned with our research interest in cancer genomes, where it’s been well documented that structural variations play a major role,” says Schatz, now an associate research professor of computer science at Johns Hopkins University and an adjunct associate professor of quantitative biology at Cold Spring Harbor Laboratory, where the analysis took place. He notes that despite its importance, structural variation has not been thoroughly studied because short-read sequencers cannot reliably identify these large genomic elements. “One of the really special properties about the PacBio Sequencer is, in addition to being able to call SNPs or small variants, we also get to look for large variants such as structural variation,” he says.

But as Schatz and his collaborators at Cold Spring Harbor Laboratory and the Ontario Institute for Cancer Research delved into this work, they realized that existing variant callers were tailored to short-read data. To make the most of the large amount of long-read information they were generating, the team wrote a suite of new analysis tools optimized for SMRT Sequencing data. “The tools catering to short-read data just aren’t made to capture the awesome information that we can now take advantage of,” says Maria Nattestad, a graduate student in Schatz’s lab who wrote several of the new algorithms. “Building our own tools was really the only way to go here.”

Those tools, which are especially important for understanding structural variation, are now being publicly released to fuel further SMRT Sequencing studies of human genomes. Also coming out soon is the team’s detailed analysis of the SK-BR-3 genome and transcriptome, which includes a high-quality assembly as well as a new understanding of gene fusions, the evolutionary history of this cell line, and more.

De novo sequencing and assembly were the first steps in making sense of the SK-BR-3 genome. With 72-fold SMRT Sequencing coverage, “we got an outstanding assembly of this genome even though it’s so complicated,” Schatz says, citing a contig N50 size of 2.5 Mb compared to a state-of-the-art short-read assembly with a contig N50 of just 3 kb. “That’s nearly a thousand-fold more contiguous going from short-read to long-read assemblies, and it’s through that improved assembly that the majority of structural variants were detected.”

Using custom-built analysis tools, including variant callers Sniffles, by Schatz lab member Fritz Sedlazeck, and Assemblytics, by Nattestad, the scientists found more than 10,000 structural variants in the SK-BR-3 genome ranging in size from 50 bases to millions of base pairs long. Another major discovery involved meticulously characterizing the complicated process that led to the cell line’s Her2 oncogene amplification.

The team also used the Iso-Seq™ method to analyze the full transcriptome of SK-BR-3, finding as much complexity at the RNA level as they saw in the DNA. “In the Iso-Seq analysis, we see many tens of thousands of novel isoforms,” Schatz says. “That’s a really strong testament to the long reads, which fully capture an isoform in one sequence — unlike short reads, where you have to infer isoform structure.”

To learn more about the project, which included novel findings about gene fusions in cancer, check out the full case study.

Read More »

Monday, April 11, 2016

From Earthworms to Alpacas: Vote Now to Choose the Next SMRT Grant Program Winner!

BlogPhoto3For the first time ever, the winner of this year’s “Explore Your Most Interesting Genome” SMRT Grant program will be decided by the community. We’ll be using our new Genome Galaxy Initiative and Experiment’s dedicated-to-science crowdfunding platform for this worldwide event.

Here’s how it works: our top five finalists will be engaging with you directly through their project pages on the Genome Galaxy Initiative via Experiment where you will have the opportunity to learn more and ask scientists about their projects. We will be conducting daily polls so you can cast your vote for the project you feel should be supported by the SMRT Grant program (see FAQ). The four runners-up will then have a second chance at seeing their projects kick-off through promotion and public donations on Experiment.

Voting starts today and remains open until May 1. Brief descriptions of the projects follow. Who will you vote for today?


Project: The Amazing & Enigmatic Alpaca

Investigator: Kylie Munyard, Curtin University

According to Munyard’s proposal, the economically important alpacas are of great scientific interest on a number of fronts, and producing a reference genome will enable new studies in both agricultural and biomedical research. Alpacas are a good model for diabetes research; they have innate mechanisms to stay free of parasites; their distant relationship to other agricultural animals makes them good for comparative study; and much more.


Project: Plant Heroes for Remediation of Soils Contaminated with Heavy Metals

Investigator: Renying Zhuo, Chinese Academy of Forestry

With this project, Zhuo aims to produce high-quality genomes for two strains of Sedum alfredii: one heavily accumulates cadmium ions from polluted soil while the other doesn’t, although both are found in the same ecosystem. Scientists hope to improve on fragmented short-read assemblies and use comparative genomics to understand the plant’s mechanism for processing heavy metals, with the ultimate goal of using this information for remediating contaminated soil.


Project: Sequencing an Extremophile Earthworm

Investigator: Luis Cunha, Cardiff University

This project would sequence the extraordinary earthworm Pontoscolex corethrurus, which lives in a volcanic geothermal field with high exposure to toxic gases, extreme temperatures, and very little oxygen. Cunha’s proposal notes that preliminary work with draft assemblies indicates significant levels of horizontal gene transfer that could be better characterized with SMRT Sequencing.


Project: Scar-Free Regeneration in the Spiny Mouse

Investigator: William Barbazuk, University of Florida

According to this proposal, the adult spiny mouse is the only known mammal with the unique ability to regenerate skin and organs after wounds without any scars or other indications of trauma, making this organism interesting for regenerative medicine. Barbazuk hypothesizes novel genes, alternatively-spliced isoforms, and gene expression regulators are responsible. He aims to use SMRT Sequencing to study the transcriptome of spiny mouse and its wound-healing properties.


Project: Highlighting Firefly: A Genome Resource

Investigator: Jing-Ke Weng, MIT

This project would help a large consortium of researchers generate a high-quality genome assembly for Photinus pyralis, an American firefly. Weng’s proposal notes that the 2,000-plus species of these charismatic flashing beetles have been understudied, and that the biological mechanisms behind important traits such as bioluminescence remain unknown.


We thank our co-sponsors for their support of this event: Sage Science, Computomics, Experiment, and RTL Genomics.

Read More »

Thursday, March 31, 2016

With Greater Contiguity, New Gorilla Genome Assembly Offers Insights into Gene Content, SVs, and More

800px-WesternLowlandGorilla05In a Science paper published today, scientists from the University of Washington, the McDonnell Genome Institute, and other organizations present a new gorilla genome assembly generated with PacBio long-read sequencing, representing an over 150-fold improvement over previous assemblies.

From lead authors David Gordon, John Huddleston, Mark Chaisson, and Christopher Hill, and senior author Evan Eichler, the paper reports that the new assembly recovers nearly all reference exons missing from the previous assembly, and provides an unprecedented look at structural variation, genetic diversity, ancestral evolution, repeat structures, and more.

The project was launched to address shortcomings with the existing gorilla assembly, which was built with short-read and Sanger sequencing data. While short-read sequencing has been instrumental for genomics, the authors write, “assemblies have become increasingly more incomplete and fragmented in large part because the underlying sequence reads are too short (<200 bp) to traverse complex repeat structures. This has led to incomplete gene models, less accurate representation of repeats, and biases in our understanding of genome biology.” The previous gorilla assembly was highly fragmented, with more than 400,000 gaps, and had been assembled using the human genome as a guiding reference.

The team used SMRT Sequencing on a western lowland gorilla named Susie, followed by assembly and polishing with FALCON and Quiver, respectively. The resulting assembly size is 3.1 Gb, with a contig N50 length of 9.6 Mb. The assembly closes 93% of the gaps, many of which are characterized by GC-rich content, and provides at least 148 Mbp of additional euchromatic sequence.

The scientists incorporated additional genome data from six gorillas, generating a reference genome called Susie3. A gene content analysis determined that nearly 95% of RefSeq exons missing from the original assembly were recovered in this assembly, and that 96% of previously incomplete gorilla genes were represented in at least one isoform. They also looked at structural variation, finding that 86% of the indels and inversion variants detected had never been seen before. “These analyses provide a comprehensive catalog of mobile element differences between human and gorilla (24.1% of all structural variation events),” the authors report.

The assembly also suggests that previous estimates of evolutionary divergence and population sizes were not as accurate as expected. “Although the difference was subtle, we found that human versus gorilla sequence alignments were significantly less divergent with Susie3 (1.60% divergent) when compared to the published gorilla assembly (1.65% divergent),” the scientists write. “We found a strong correlation with the difference in divergence and regions enriched for Alu and G+C content … suggesting that mismapping, collapse or underrepresentation within these regions of the Illumina-based assemblies may be contributing to this excess of divergence.” They also report that previous estimates of the most recent population bottleneck for western lowland gorillas “may have been underestimated by a factor of ~1.5, highlighting the importance of using higher quality assemblies when fitting demographic models.”

The scientists note that SMRT Sequencing has put high-quality de novo mammalian assemblies within reach of individual labs. “Our results demonstrate the utility of long-read sequence technology to generate high-quality working draft genomes of complex vertebrate genomes without guidance from preexisting reference genomes,” they conclude. “The genome assembly that results from using the long-read data provides a more complete picture of gene content, structural variation and repeat biology as well as allows us to refine population genetic and evolutionary inferences.”

This exciting advance was also presented by Christopher Hill at AGBT — check out the recording of his presentation.

Read More »

Wednesday, March 30, 2016

New Study Uses SMRT-ChIP Method to Find Novel Methylation in Mouse Embryonic Stem Cells

In a new Nature publication, scientists from Yale and other institutions report the discovery of N6-methyladenine (N6-mA) in mouse embryonic stem cells (ESCs), contrary to the conventional wisdom that the only form of methylation in mammals is 5-methylcytosine. Through the project, the team also developed a new method for pairing chromatin immunoprecipitation (ChIP) with SMRT Sequencing. Both of these developments have significant implications for the genomics community.

DNA methylation on N6-adenine in mammalian embryonic stem cells” comes from lead author Tao Wu and senior author Andrew Xiao, both at Yale School of Medicine. The team also included collaborators from the University of Arkansas for Medical Sciences, the University of North Carolina, the Icahn School of Medicine at Mount Sinai, and PacBio. “The discovery of N6-mA in mammalian ES cells sheds new light on epigenetic regulation during early embryogenesis and may have impacts in the fields of epigenetics, stem cells and developmental biology,” Wu et al. write.

To conduct this study, the team developed a SMRT-ChIP method to study DNA modifications at specific histone variant regions. The SMRT Sequencing data demonstrate the presence of N6-mA at nearly 400 sites in the genomic regions studied, a finding that was confirmed with mass spec analysis. The team focused on the H2A.X deposition, which has been associated with cell fate transitions, sequencing the enriched, unamplified DNA from those regions. They also compared SMRT-ChIP results to those from DIP-seq, an orthogonal method, and found strong concordance.

The scientists identified Alkbh1 as the demethylase that regulates adenine methylation and went on to create cell lines with this gene knocked out, showing that N6-mA levels increased by a significant degree without the demethylase. The team also used Alkbh1 to shed light on how these N6-mA sites function; in the knockout cells, the expression of 550 genes was downregulated compared to the original cell line. That contrasts with other recent discoveries of N6-mA in organisms including C. elegans and D. melanogaster. “Intriguingly, [those] studies implicated N6-mA in gene activation, instead of repression, as is the case for 5mC repression,” the scientists write.

They also report a strong location bias for the methylated sites, with the greatest enrichment on the X chromosome. “N6-methyladenine deposition is inversely correlated with the evolutionary age of LINE-1 transposons; its deposition is strongly enriched at young (<1.5 million years old) but not old (>6 million years old) L1 elements,” the authors write, noting that young L1s are important in the beginning of embryogenesis. “We favour the view that N6-mA-mediated silencing plays an important role in safeguarding active L1 elements in mammalian genomes. The levels of N6-mA are controlled precisely by Alkbh1 in ES cells such that they favour L1 transcription while preventing it from succumbing to overactivation and genomic instability.”

Read More »

Thursday, March 24, 2016

CSHL Scientists Discuss Long-Read Sequencing for More Contiguous Assemblies and Complex Genomes

Much like the “sharpen” tool in Photoshop brings a picture into tighter focus and enhances the fine detail, long-read sequencing offers enhanced resolution of genomic information, according to Cold Spring Harbor Laboratory colleagues Mike Schatz and Maria Nattestad. The scientists spoke with Mendelspod’s Theral Timpson about how long-read sequencing is advancing their research in unique and powerful ways; a brief recap of their conversation follows.

Schatz uses PacBio sequencing to establish incredibly accurate assemblies of microbial, crop, animal, and human genomes. Indeed, SMRT technology has significantly improved his work on the flatworm Macrostomum lignano, an organism with regenerative powers. With only a few reference genomes and limited functional studies available, the flatworm proved to be particularly challenging to sequence with short-read solutions. “We were quite frustrated by the results that we were getting, where the assembly was of very poor quality,” Schatz says. “It was also missing something like half of the genome that we expected to be there; it just wasn’t present at all in the assembly that took place.” At this point, the team realized that long reads would help them achieve a much improved reference genome. By collaborating with algorithm developers, PacBio, and the NIH, the team created an assembly that was about 100 times more contiguous than assemblies based on short-read data.

Long reads also appeal to Nattestad, who is using de novo assembly of the SK-BR-3 breast cancer cell line as a way to fully characterize not just SNPs, but also major structural variations. One of her interests in SK-BR-3 is to better understand Her2 oncogene amplification, and she has undertaken a historical, step-by-step reconstruction of its mutations using software she developed for that purpose. “Our focus here is not just to see how many copies of Her2 there are, or to see that it is Her2-amplified like you would in a diagnostic setting. Instead, we wanted to see how that amplification has happened over time in the genome, and try to reconstruct a history of steps that took place,” she says. Schatz notes that in SK-BR-3, the region around Her2 has undergone what they call ‘genome gymnastics,’ a very complicated series of amplifications, inverted duplications, and translocation events. He says that “trying to capture that level of complication and sophistication just from standard variant calling approaches is very challenging.” Nattestad plans to follow up with analyses of other oncogenes known to be amplified in this cell line.

This year, Schatz expects to see a number of reference-grade human genomes published using PacBio technology to create high-quality de novo assemblies. He says, “If you’re interested to do a de novo assembly of an entirely novel species, my strong recommendation — without any hesitation — is to do long-range PacBio sequencing, and I would advocate for 100x coverage of the longest reads you can possibly generate. … This will give you the most successful assembly.” Structural variation studies are similar, he says: “You really want to use the long-read technology in order to capture those structural variations as accurately as possible.”

Read More »

Tuesday, March 15, 2016

Genome Galaxy Initiative:
On a Mission to Sequence the Beautiful and Mysterious Kākāpō

Photo courtesy of Andrew Digby, DOC New Zealand

Photo courtesy of Andrew Digby, DOC New Zealand

New Zealand is more than an amazing vacation destination or the setting of the Lord of the Rings movies; it’s also home to a wealth of fascinating species that evolved in isolation for millions of years. The critically endangered kākāpō bird is one such species, and it needs your help now.

David Iorns, a native New Zealander and founder of the Genetic Rescue Foundation, has launched a crowdfunding campaign to raise money and pursue a grand vision: saving kākāpōs from extinction. With a high-quality genome already underway using SMRT Sequencing, Iorns wants to resequence all remaining 125 kākāpōs. If funded, this project would be the first to digitally capture the genetics of every extant member of a species. Iorns hopes the data generated from these resequencing efforts, combined with the high-quality reference genome needed for interpretation, will help scientists better understand the genetic diversity of kākāpōs and ultimately prevent this species from going extinct.

Kākāpōs are part of the parrot family, but they’re nocturnal, flightless, and very heavy, making them unlike most of their better-known cousins. “They’re special and they’re worth saving,” Iorns says. Because the population has dwindled to so few members, conservation and breeding efforts have proven challenging. Sequencing every bird will provide an “incredibly rich genetic dataset that will help us get to the bottom of some of these fertility and genetic bottleneck-related problems,” he adds.

This project turned to PacBio’s Genome Galaxy Initiative on Experiment, a scientific crowdfunding platform, for public support. Iorns, who previously used crowdfunding to sequence the extinct moa bird, says he gravitated to Experiment’s platform because he’s a citizen scientist who doesn’t follow traditional research funding methods. For the kākāpō project, he hopes to raise $45,000 – you can be a part of this effort and follow this scientific expedition with a small contribution to the cause.

PacBio launched the Genome Galaxy Initiative to help support researchers looking for alternative ways to fund research breakthroughs propelled by SMRT Sequencing. Even a small donation can help scientists begin to address critical, underfunded issues. All donations are refunded if projects do not reach their funding campaign goals. We’re excited to see growing support for this kākāpō project and look forward to many more Genome Galaxy Initiative proposals to come.

Read More »

Thursday, March 3, 2016

Prevalent Methylation in Prokaryotic Genomes Suggests Regulatory Functions

A new publication from scientists at Lawrence Berkeley National Laboratory, the Joint Genome Institute, and other organizations reports a landmark study of genome-wide methylation in prokaryotes. The analyses of 230 bacteria and archaea species revealed both more methylation than expected and novel epigenetic mechanisms.

“­­­The Epigenomic­­­ Landscape of Prokaryotes” from lead author Matthew Blow, senior author Richard Roberts, and collaborators was recently published in PLoS Genetics. The team used SMRT Sequencing to detect 6-methyladenosine (m6A), 4-methylcytosine (m4C), and 5-methylcytosine (5mC) across the 230 genomes. “Bisulfite sequencing has enabled genome-wide surveys of 5mC methylation, but a historic absence of tools for studying m6A and m4C modifications that predominate in prokaryotic DNA has precluded more comprehensive studies,” the authors write, noting that the unique ability of SMRT Sequencing to capture all of these methylation states made a much more comprehensive study possible for the first time.

The authors reported widespread methylation in these genomes, with 93% of organisms harboring at least some methylated DNA. The scientists went on to identify methylated motifs, finding more than 800 distinct patterns, and also annotated the binding specificities of the 600+ methyltransferases detected. Of particular interest were the evolutionarily conserved orphan methyltransferases — or Type II methyltransferases with no obvious restriction enzyme — found in nearly half of all prokaryotes analyzed. Overall, these findings suggest that methylation has an important role in genome regulation for these organisms in addition to the well-established function of genome protection.

The team sequenced prokaryotes to an average 130X coverage, generating a total of 105 Gb of sequence data across all organisms. They report an average of three methylated motifs per organism, with m6A methylation accounting for 75% of all base modifications observed. “SMRT sequencing offers a powerful approach to determine the recognition specificities of several Types of [restriction-modification] systems that have previously been very difficult to decipher,” Blow et al. write. “Type I RM systems cleave DNA at large distances from their binding site, while both Type IIG and Type III systems sometimes have difficulties in producing complete cleavage patterns. This can make them difficult to study using traditional approaches that rely on analysis of patterns of restriction digestion.”

Novel restriction-modification systems as well as new forms of existing systems, including Type IIG systems, were discovered throughout the data set, suggesting alternative functions including genome regulation. The scientists also found evidence of methylation pattern conservation. “Given the extensive amount of methylation present in the majority of the genomes we have examined, it is tempting to believe that methylation is a very important modification of bacterial and archaeal DNA perhaps providing regulatory functions that we have yet to fully appreciate,” the team reports. “Additionally, it is reasonable to assume that the evolution of DNA methylation was an early event that was important for the viability of primitive organisms.”

If you like JGI studies as much as we do, don’t miss the institute’s user meeting starting on March 21st.

Read More »

Monday, February 29, 2016

On Rare Disease Day, Celebrating the Contributions of Genomics

Today we are celebrating Rare Disease Day with like-minded folks all over the world. The tribute kicked off in 2008 and has gathered so much momentum that people in more than 80 countries are expected to participate in 2016. Each disease is rare — affecting fewer than 1 in 1,500 people — but because there are so many of these diseases, together they affect millions of people globally.

Here at PacBio, many of our team members have their own stories about dealing with rare disease, and we imagine the same is true of our blog readers. We’re so proud that leading scientists have already begun using SMRT Sequencing to make important new DNA and RNA discoveries about the genetics and disease mechanisms of rare diseases. In the future, we anticipate even more of these studies will lead to novel breakthroughs as scientists expand their use of PacBio sequencing for human disease studies. Together, we can have a real impact in helping families struggling with these diseases.

Here are some examples of how researchers have shed light on rare diseases with SMRT Sequencing:

Baylor’s Jim Lupski, who studies and has been diagnosed with Charcot-Marie-Tooth neuropathy, recently spoke about a de novo PacBio assembly of his genome that found much more structural variation — especially copy number changes — than previous assemblies from short-read data. He also described how long reads are able to better resolve and characterize break points associated with these disease-causing structural variants, and also resolve sequence context to provide base-level resolution of specific genotypes.

In a separate presentation, Richard Gibbs from Baylor College of Medicine noted that just 25% of Mendelian disorders have been solved with short-read sequence data, and suggested that the success rate may be limited by the inability of these platforms to detect structural variation, repeat regions, and complex events. With SMRT Sequencing and structural variation analysis algorithms created at his genome center, scientists may be able to uncover the genetic basis of many more Mendelian disorders using low-coverage, long-read PacBio sequencing.

Paul Hagerman from the University of California, Davis, led the first team in the world to completely sequence a fully expanded pathogenic ‘CGG repeat allele’ in the FMR1 gene on the X chromosome that is associated with Fragile X Syndrome. Previously thought to be “unsequenceable,” PacBio sequencing of repeat expansions in the FMR1 gene is shedding new light on pathogenic variants and interruptions that are meaningful for screening and carrier counseling, and that may lead to improved diagnostic and intervention strategies for families affected by Fragile X syndrome.

In a related project, follow-up work from Flora Tassone and other UC Davis researchers applied the Iso-Seq method to characterize alternative splicing in the FMR1 gene for a different disorder called Fragile X-associated tremor/ataxia syndrome (FXTAS). They found differential expression for certain gene isoforms suggesting a functional relevance for these in the pathology of FMR1-associated disorders.

Scientists in North Carolina generated the first high-quality sequence of MUC5AC, a gene that has been implicated in a range of diseases, including cystic fibrosis. The gene had long been represented as a gap in the human reference genome because of its complex and highly repetitive central exon. Characterization of the MUC5AC gene and the sequence variation in the central exon will facilitate genetic and functional studies for this critical airway mucin.

In a recent talk at AGBT, Bobby Sebra from the Icahn School of Medicine presented results from the recent targeted PacBio sequencing of the C9orf72 loci, which contains a GGGGCC repeat expansion now known to cause familial ALS (also known as Lou Gherig’s Disease). He presented sequencing data from both the PacBio RS II platform and the new Sequel System, showing the ability to fully characterize the sequence of this locus and provide novel insights into the genetics underlying this debilitating disease.

Tetsuo Ashizawa and Karen McFarland from the University of Florida are making progress understanding the genetics of spinocerebellar ataxia type 10 (SCA10). In a recently published study, they describe sequencing through a pentanucleotide repeat allele known to cause this disorder, and characterizing various repeat interruption motifs associated with different SCA10 clinical phenotypes.

Shinichi Morishita’s lab at the University of Tokyo has described similar methods for characterizing tandem repeats associated with the SCA31 brain disease using a hybrid long- and short-read approach.

At Stanford University, Ayal Hendel is working in collaboration with John Day and the Myotonic Dystrophy Foundation to study the CTG/CAG repeat tracts that represent the genetic basis for myotonic dystrophy type 1 (DM1), and explore the cellular and molecular pathological mechanisms involved in DM — including aberrant alternative splicing.

We’d like to congratulate these scientists, along with all the others around the world who are working hard to make a difference in the lives of people burdened by rare disease. Whether you’re using our technology or any other, we thank you and wish you all the best!

PacBio is proud to be an official partner of Rare Disease Day. Get involved with global efforts or US-based initiatives to honor those dealing with rare diseases.

Read More »

Thursday, February 25, 2016

New Views of Microbial Communities Call for Updates to Infectious Disease Tenets


Robert Koch

In a perspective recently published in Science magazine, scientists Allyson Byrd and Julie Segre from the National Human Genome Research Institute used recent advances in microbial analysis to look at Koch’s postulates through a new lens.

Published by Robert Koch in 1890, these principles have become widely accepted in microbiology as the definitive means to prove that a specific pathogen is the cause of an infectious disease. As summarized by Byrd and Segre, the postulates dictate that: “First, the microorganism occurs in every case of the disease; second, it is not found in healthy organisms; and third, after the microorganism has been isolated from a diseased organism and propagated in pure culture, the proposed pathogen can induce disease anew.”

The authors point out that Koch lived long before the discovery of antibiotics and nucleic acids, noting that recent revelations in infectious disease research call for an update of these principles — specifically, the role of microbial communities in causing or preventing disease. “In light of recent appreciation of microbial consortia, the scientific community should consider infectious disease causation in a broader systems biology context in which host genetic variability, health status, past exposure history, and microbial strains and communities are all important,” Byrd and Segre write. “As technology advances and new scientific discoveries are made, we must dynamically adapt Koch’s postulates so today’s science maintains the integrity that Koch originally fostered.”

The authors review several recent infectious disease papers, noting that hospital-acquired infections and microbiome studies have both shed new light on the association between microbes and disease. For instance, certain commensal microbes appear to protect against infections like those caused by Clostridium difficile, Salmonella, and other pathogens. Microbes can also work together to prevent infection, as seen in recent work demonstrating that a six-member microbial community ameliorated the effects of C. difficile infection, the authors report. “These findings force us to consider under what circumstances a consortium of microbes can fulfill Koch’s postulates,” they add. “For example, do all members of the community have to be grown in pure culture and tested individually, or is it sufficient to grow and test a group culture?”

Byrd and Segre note that new sequencing technologies have made it possible to study and analyze microorganisms that cannot be cultured. They also recommend updates to Koch’s postulates that would expand the rules to cover microbial communities.

SMRT Sequencing provides a high-resolution view that allows scientists to interrogate microbes — both individually and in communities — with greater accuracy and completeness than has ever been possible before. It’s an honor to see so many PacBio users delivering new insights that surely would have made Koch proud.

Read More »

Tuesday, February 23, 2016

15th Anniversary of the Human Genome Publication; A Conversation with Mike Hunkapiller

This month serves as the 15th anniversary of the first publication of the human genome by both public and private efforts. PacBio CEO Mike Hunkapiller was a central player in both efforts as the leader of Applied Biosystems, the company that developed and supplied the automated Sanger-based sequencing technology that made the projects possible.

In honor of the occasion, Mendelspod host Theral Timpson asked Mike to join him in a commemorative conversation to discuss his memories of the project, as well as how genome sequencing technology has developed since.

Mike talked about what was happening behind the scenes of these historic efforts, and the ways in which public and private efforts collaborated. He said the DNA sequencing technology was in some ways ahead of other aspects of the project needed to make it work — such as sample handling and informatics tools to assemble the data. The informatics was the final hurdle, he said, which came together “literally a few days before the famous announcement at the White House.”

In the years since, the cost of sequencing has gone down, while the technology has become faster and better. This includes the cost of conducting projects with PacBio sequencing, which has gone down “dramatically,” Mike said. While hesitant to predict what the next 15 years will bring, he noted that the company’s newest product, the Sequel System, is designed to reduce the cost of SMRT Sequencing even further.

There exists today a “quality versus quantity camp” when it comes to genome sequencing, Mike and Theral agreed. Mike explained that while short-read technologies have been good for generating lots of single nucleotide variant information from a large number of individuals, “what’s become clear in the last year or so is how much other kind of variation — that are in some sense structurally more important — you just don’t pick up with the short-read technology.” As awareness of this fact has grown, Mike said, interest among scientists in going back to get more complete sequence information has increased considerably. (See this Nature paper by Evan Eichler and colleagues.)

Mike explained that PacBio’s goal over time is to lower the cost of SMRT Sequencing so that scientists can get not only a SNP map (which is what the short-read “$1000 genome” is today) but also structural information, high-quality de novo assemblies, haplotype phasing, and more — all for the same cost. “If we can get the cost down to where you get all of that information in one experiment, for the same price or roughly the same price as it takes to get just one area of that — say the single nucleotide variation — then we think we have a very, very compelling offering,” Mike said.

Theral asked Mike how much he thinks the tool makers have been at the steering wheel of the whole genomics revolution, and Mike responded that most of the technology has been driven by the private sector, while the scientific community has driven the applications of how to use the data to solve all types of biological questions. “It’s a virtuous circle,” he said, “where technology drives science and science drives technology.”

To listen to the entire 30-minute conversation, visit: http://mendelspod.com/podcasts/human-genome-turns-15-mike-hunkapiller-ceo-pacbio/.

Read More »

Monday, February 22, 2016

AGBT Day 4: A Better Gorilla Assembly, and Data from the Sequel System

On the final day of AGBT, attendees strapped in for the last talks of the conference before the ’80s-themed dance party to close out the meeting. Two of those talks focused on SMRT Sequencing, one including new data from our Sequel System.

Christopher Hill from the Eichler lab at the University of Washington gave a fascinating talk on creating reference-grade assemblies for the great ape species. These resources will be incredibly helpful for shedding light on biological mechanisms behind speech, disease, neurological behavior, and other traits that separate us from our closest primate relatives. Current assemblies for these apes — including bonobo, chimpanzee, gorilla, and orangutan — are highly fragmented, with contig N50s in the tens of kilobases, Hill noted. He and his team are using SMRT Sequencing to resolve repetitive and highly complex regions to build a new gorilla assembly.

With PacBio sequencing and the FALCON assembler, the new assembly has just 16,000 contigs (compared to more than 460,000 in the existing assembly) and a contig N50 length of 9.6 Mb (compared to 11.7 kb in the existing assembly). The new gorilla assembly closed 94% of gaps from the existing assembly, added 164 Mb of new euchromatic sequence, and corrected previous misassemblies. Hill noted that structural variation in particular can be detected more robustly with this new resource. He also said that the gorilla reference is now more in line with the human reference thanks to this marked increase in contiguity. His team is currently working to bring the chimpanzee genome up to the same standard.

Our own CSO, Jonas Korlach, also gave a talk in the closing session of AGBT on the value of SMRT Sequencing for addressing complex diseases. He briefed attendees on the new, higher-throughput Sequel System and showed comparisons of Sequel data with data from the PacBio RS II system across a variety of applications. He noted the strong concordance between the platforms in studies such as highly multiplexed targeted sequencing of breast and ovarian cancer samples, a de novo E. coli genome assembly, and Iso-Seq analyses of full-length mRNA in control and cancer samples. Korlach stressed the value of long reads for high-quality DNA sequencing and assembly, but noted that read length alone isn’t enough; other essential elements include lack of GC bias and high consensus accuracy, he said.

We hope that you enjoyed this year’s AGBT as much as we did. We’re already looking forward to next year’s meeting!

Read More »

Friday, February 19, 2016

AGBT Day 3: Human Genomes and Their Microbial Friends

We’ve been in the genomics world long enough to remember when it was a big deal to see a great single-gene assembly or microbial genome assembly reported in an AGBT talk. It’s really something to attend this year and see some beautifully assembled whole human genomes.

Several of the Friday talks really captured our interest, but we can only cover a couple of them here. NCBI’s Valerie Schneider spoke about efforts through the Genome Reference Consortium to improve assembly of the human reference genome, noting that one challenge has been the shift from a clone-based approach during the Human Genome Project to whole-genome sequences today. While these new sequences are adding tremendously valuable information to the reference assembly and are shaping how it is curated, she said, they also introduce different assembly issues that have to be reconciled with existing information.

Schneider noted that considerable improvements have occurred for highly repetitive regions, such as the mucin genes. SMRT Sequencing has made it possible to fully resolve many of these regions, which had long appeared intractable. She also presented recent work on the CHM1 and CHM13 hydatidiform moles, which have haploid human genomes that are helping make sense of some complex regions in the assembly thanks to long-read sequencing. Schneider illustrated the challenges of choosing which sequence to add to the reference when she presented a number of quality metrics indicating that some assemblies were better for, say, contiguity, while others were better for QV score. “No one assembly is excelling for all metrics,” she said.

During another talk, Karyn Meltz Steinberg from the McDonnell Genome Institute at Washington University reported the first African reference genome assembly, a Yoruban sample analyzed with 70x coverage of SMRT Sequencing data. She told attendees that the best strategy to achieve a gold-quality genome is to use PacBio sequencing, which offers a vast improvement over short-read approaches. The team used a BioNano Genomics genome map to add extremely long-range scaffold information, boosting the already impressive contig N50 of 6 Mb to a scaffold N50 of nearly 15 Mb.

Also in the informatics session, Maria Nattestad from Cold Spring Harbor Laboratory presented an algorithm called SplitThreader for analyzing highly amplified or rearranged cancer genomes. Inspired by examples like a commonly used Her2-amplified breast cancer cell line, which has a full complement of 80 chromosomes, the SplitThreader algorithm analyzes complex events to find the most likely evolutionary path that created them. With PacBio sequencing data, the tool was able to uncover and visualize new candidate fusion genes.

In human microbiome work, Gregory Buck from Virginia Commonwealth University presented data from two projects designed to elucidate the profiles of vaginal microbial communities by studying thousands of women. This particular microbiome may be associated with preterm birth, HIV risk, and more. Buck noted that some of the microbes discovered have been sequenced with PacBio systems to produce remarkably high-quality, fully closed assemblies in very little time. The projects have identified 20 vagitypes, or typical microbial community profiles, some of which appear to be influenced by genetics and ancestry.

We recorded some of the AGBT talks this year, and will be making those videos available on the blog shortly. Stay tuned!

Read More »

Subscribe for blog updates: