Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.


Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.


You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences’ rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences

PacBio blog

This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.

Search PacBio’s Blog

Wednesday, May 18, 2016

Discover New Insights into Human Genetic Variation at ESHG in Barcelona

The PacBio team is looking forward to joining 3,000 other scientists in Barcelona May 21-24 for the European Human Genetics Conference, better known as ESHG. Organized by the European Society of Human Genetics, this is the 49th year of a high-quality meeting where the latest developments in human and medical genetics are discussed.

This year, we’ll be showcasing our new Sequel System at booth #260 in the exhibit hall. Come visit us and learn more about it! With higher throughput than our previous instrument, we think the Sequel System will be a great fit for the genomics community on projects such as multiplex targeted sequencing and RNA isoform sequencing.

To learn how scientists are already using PacBio sequencing to address unanswered questions in genomics, don’t miss the SMRT Sequencing workshop hosted by Roche Sequencing. The luncheon event will be held on Monday, May 23, from 11:15 a.m. to 12:45 p.m. in rooms 120 and 121. Christine Beck from Baylor College of Medicine will discuss the use of long fragment capture and sequencing techniques to reveal structural variation at clinically relevant loci. Robert Sebra from the Icahn School of Medicine at Mount Sinai will discuss how his lab uses long-read sequencing to gain a more comprehensive view of complex regions of the genome, including pharmalogically important sites, oncogenes, and structural variants linked to genetic disease.

There will also be ESHG talks and posters featuring SMRT Sequencing data in a wide range of applications, including whole genome assembly and haplotyping, immunology, repeat expansion disorders, and non-coding RNAs. Please join us at the following presentations:


Saturday, May 21, 2016, 10:30 a.m. – 12:00 p.m.

Talk Title: E01.1 Long-read sequencing of complex genomes

Speaker: Evan Eichler, University of Washington


Saturday, May 21, 2016, 6:30 p.m. – 8:00 p.m.

Talk Title: A distinct class of chromoanagenesis events characterized by focal copy number gains

Speaker: Matthew Hestand, Leuven, Belgium


Sunday, May 22, 2016, 1:00 p.m. – 2:30 p.m.

Talk Title: Enrichment of unamplified DNA and long-read SMRT Sequencing to unlock repeat expansion disorders

Speaker: Tyson Clark, Pacific Biosciences


Sunday, May 22, 2016, 1:00 p.m. – 2:30 p.m.

Talk Title: C07.6 – Detection of AGG interruptions in FMR1 premutation females by single-molecule sequencing

Speaker: S. Ardui, KU Leuven


Monday, May 23, 2016, 8:30 a.m. – 10:00 a.m.

Talk Title: S11.3 – Mapping Human Long Noncoding RNAs

Speaker: Rory Johnson, Barcelona, Spain


Tuesday, May 24, 2016, 11:00 a.m. – 12:30 p.m.

Talk Title: C23.6 – Identifying novel long non-coding RNAs in the Human genome.

Speaker: M. P. Hardy, Wellcome Trust Sanger Institute




Saturday, May 21, 12:00 p.m. – 2:00 p.m.

Monday, May 23, 10:15 a.m. – 11:15 a.m.

Presentation: P16.07C – Application Specific Barcoding Strategies for SMRT Sequencing


Saturday, May 21, 12:00 p.m. – 2:00 p.m.

Sunday, May 22, 4:45 p.m. – 5:45 p.m.

Presentation: P16.02B – Highly Contiguous de novo Human Genome Assembly and Long-Range

Haplotype Phasing Using SMRT Sequencing


Sunday, May 22, 10:15 a.m. – 11:15 a.m.

Presentation: P07.17A – Resolving KIR genotypes and haplotypes simultaneously using Single-

Molecule, Real-Time Sequencing


Sunday, May 22, 4:45 p.m. – 5:45 p.m.

Presentation: P15.14B – Full-length and phased CYP2D6 variant genotyping using the PacBio RS II

Read More »

Tuesday, May 17, 2016

Arizona Scientists Deploy BAC Expertise and SMRT Sequencing for Crop Genomes

At the University of Arizona, a leading genomics research facility benefits from decades of BAC-based sequencing expertise, original studies of crop genomes, and a unique emphasis on high molecular weight DNA. Rod Wing, founding director of the Arizona Genomics Institute (AGI) and a professor in the School of Plant Sciences, Ecology & Evolutionary Biology at the university, was a pioneer in building BAC-based reference genomes in the ’90s. Today, that carefully honed expertise in isolating large DNA fragments gives him and his lab a real advantage for making the most of long-read sequencing.

Wing’s efforts primarily focus on plant genomes, but his service facility performs genomic studies on a wide variety of organisms for investigators at the university and around the world. He says the PacBio platform is useful for customers interested in sequencing any kind of crop, as well as animals and other organisms. “If your genome is littered with repetitive elements that are highly similar, PacBio allows you to get through those elements and back into some more unique sequence for a better assembly,” he notes. “Our facility can work on pretty much any organism — we just have to have some good DNA.”

To meet his own research goal of building high-quality reference genomes for every species of rice, 23 in total, Wing chose SMRT Sequencing. “PacBio is revolutionizing our approach for whole genome shotgun, BAC, and targeted sequencing,” he says. The rice genome assemblies his team is building will be essential to improving rice crops for higher yield, expanded growing areas, and stress resistance.

In recent work to sequence the African rice genome, a few highly repetitive and rearranged BACs were particularly challenging to get through with other platforms. “It was taking us months to try to get through this one region,” Wing recalls. Using SMRT Sequencing, he produced the full sequence of this region “in a nice single piece within a couple of days.” For particularly difficult regions like that one, PacBio sequencing is a very powerful targeted approach, he adds. With a few other rice genomes, his team is pooling BACs, 32 at a time, in individual SMRT Cells. “Around 85 to 90 percent of those BACs are completely circularized,” he says. “It’s pretty easy to go through a genome using this approach.”

Beyond the highly accurate, long-read sequence data he obtains from SMRT Sequencing, Wing also likes the PacBio platform for full-length isoform sequencing and its ability to characterize the methylome. He notes that now they can take rice tissues at several developmental stages and under many different environmental conditions, isolate RNA, and do Iso-Seq analysis on those samples to enable whole plant transcriptome analysis. This could help the community map gene networks and pinpoint the biological mechanisms behind traits such as bigger leaves, water uptake, and more.

To learn more about Wing’s rice genome projects and his service facility, check out this case study.

Read More »

Thursday, May 12, 2016

From Seabass to Salmon:
Swimming in High-Quality Genomes

CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=1654077

Asian seabass

A global collaboration of researchers has produced what is likely the most contiguous assembly of a fish genome to date. “Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding,” published in PLoS Genetics, comes from lead author Shubha Vij and senior author László Orbán with collaborators at nearly two dozen labs.

The team set out to sequence Lates calcarifer, the Asian seabass, which has a genome of about 670 Mb grouped into 24 A chromosomes and as many as 10 B chromosomes. They used SMRT Sequencing from PacBio to overcome the fragmented and incomplete assemblies associated with short-read data, and incorporated optical and genetic mapping to add additional layers of information to the assembly. At the end, they generated an incredibly valuable resource that has been shared with the community. “The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics,” the authors write.

That quality largely came from 90x SMRT Sequencing coverage of the genome, which on its own produced an assembly with about 3,900 contigs and a contig N50 length greater than 1 Mb. Layering in optical mapping and genetic markers allowed the team to place contigs into larger scaffolds, ultimately achieving 24 individual chromosomal scaffolds. The scientists anticipate this resource will be particularly useful for comparative genomics and for development of assays such as GWAS, allele mining, and genomic selection.

The scientists used the MHC region for a close-up inspection of the genome quality and contiguity. They compared the seabass MHC locus to that of stickleback, which had “the most complete published fish genome assembly available in the public repositories” at the time of the study, the team reports. In Asian seabass, they identified 14 MHC class I genes, spread across eight contigs, half of which were longer than 1 Mb. “By contrast, the MHC-class I genes from stickleback, were located on almost double the number of contigs, of which all except one were ≤ 113 kb in length,” they write.

The remarkable assembly quality gave researchers the opportunity to delve into repeat sequences, which represent nearly 20% of the genome and included several kinds of complex tandem repeat sequences. In addition, the team studied duplicated genes and found they were enriched for functions that could help explain how the fish transforms from male to female after maturity.

The scientists note that while most eukaryotic genomes published so far have been produced with short-read data, a strategy using only PacBio data “seems ideal for assembling mid-to-large eukaryotic genomes since it ensures contiguity, less ambiguity and assembly metrics surpassing all of the fish genomes sequenced thus far.”

And if you just can’t get enough fish genomes, don’t miss this publication reporting a new salmon assembly. The team used PacBio sequencing as part of a multi-platform assembly process; the findings shed light on the rediploidization of salmon.

Read More »

Wednesday, May 4, 2016

Eco-friendly Soil Remediation Gets a Boost with the Latest SMRT Grant Program Winner

Congratulations to the winner of the first-ever SMRT Grant program decided by the community: Renying Zhuo of the Chinese Academy of Forestry!

We ran polling through our Genome Galaxy Initiative on the Experiment crowdfunding platform and were amazed to see much how it galvanized the genomics community. There were 30,000+ responses to the competition across the five finalists for our “Explore Your Most Interesting Genome” grant opportunity. Zhuo garnered the most support for his project to sequence two highly related strains of Sedum alfredii for a comparative genomics investigation to identify key genes important for remediating soil contaminated by heavy metals. This project could be applied to address cadmium ion pollution, a growing concern within rapidly industrializing nations.

The remaining finalists have a second chance to earn funding by launching their projects through a special crowdfunding event. The runners-up will kick off their campaigns through Experiment, and we sincerely hope that supporters of these worthy projects come back to help. Every donation makes a difference! And, donations are only accepted when projects meet their crowdfunding goal. Here’s a quick look at the projects now open for contributions:

The Amazing & Enigmatic Alpaca

Investigator: Kylie Munyard, Curtin University

According to Munyard’s proposal, the economically important alpacas are of great scientific interest on a number of fronts, and producing a reference genome will enable new studies in both agricultural and biomedical research. Alpacas are a good model for diabetes research; they have innate mechanisms to stay free of parasites; their distant relationship to other agricultural animals makes them good for comparative study; and much more.

Sequencing an Extremophile Earthworm

Investigator: Luis Cunha, Cardiff University

This project would sequence the extraordinary earthworm Pontoscolex corethrurus, which lives in a volcanic geothermal field with high exposure to toxic gases, extreme temperatures, and very little oxygen. Cunha’s proposal notes that preliminary work with draft assemblies indicates significant levels of horizontal gene transfer that could be better characterized with SMRT Sequencing.

Scar-Free Regeneration in the Spiny Mouse

Investigator: William Barbazuk, University of Florida

According to this proposal, the adult spiny mouse is the only known mammal with the unique ability to regenerate skin and organs after wounds without any scars or other indications of trauma, making this organism interesting for regenerative medicine. Barbazuk hypothesizes novel genes, alternatively-spliced isoforms, and gene expression regulators are responsible. He aims to use SMRT Sequencing to study the transcriptome of spiny mouse and its wound-healing properties.

Highlighting Firefly: A Genome Resource

Investigator: Jing-Ke Weng, MIT

This project would help a large consortium of researchers generate a high-quality genome assembly for Photinus pyralis, an American firefly. Weng’s proposal notes that the 2,000-plus species of these charismatic flashing beetles have been understudied, and that the biological mechanisms behind important traits such as bioluminescence remain unknown.

The PacBio team is very grateful to the scientific community and their supporters, as well as our co-sponsors for this grant program: Sage Science, Computomics, RTL Genomics, Texas A&M AgriLife Genomics and Bioinformatics Service, and Experiment.

Read More »

Monday, May 2, 2016

Join the SMRT Community: User Meetings in Europe, Asia, and the US

There are several PacBio user meetings coming up, and with locations around the world we hope you’ll be able to attend one of them. These meetings are a great way to meet fellow customers, exchange tips, and learn about new applications. If you are interested, please register as soon as possible to reserve your seat.


The Netherlands: SMRT Leiden Symposium & Informatics Developers Meeting, June 6-8

This meeting, organized and hosted by Leiden University Medical Center’s human genetics department, includes a scientific symposium and the first SMRT Informatics Developers meeting in Europe. There’s an impressive agenda for the two-day symposium, with topics covering genomics, transcriptomics, and epigenomics of organisms ranging from microbes to humans. Keynote speakers include Evan Eichler, Steven Marsh, Shinichi Morishita, and Hagen Tilgner. On June 8, the informatics conference will kick off with a keynote talk from Gene Myers and provide plenty of opportunity for brainstorming about approaches to de novo assembly, structural variant detection, genome phasing, the Iso-Seq method, and more. Registration is free.


Maryland: East Coast User Group Meeting & Workshops, June 7-9

PacBio’s fourth annual East Coast User Group Meeting will be hosted by the University of Maryland’s Institute for Genome Sciences. In addition to the day-long meeting on June 8, attendees may also participate in half-day workshops on sample prep (June 7) and bioinformatics (June 9). The sample prep workshop will cover best practices and basic data analysis, with group breakout discussions for deeper dives into specific topics of interest. The bioinformatics event will introduce SMRT Analysis 3.x, including the SMRT Link GUI and command line examples. Data management for both the PacBio RS II and the new Sequel System will be discussed. Reserve your seat now.


Singapore: Asia User Group Meeting & Workshop, June 8-10

This event, held at the Grand Copthorne Waterfront Hotel, will feature two days of presentations for the user group meeting and an additional day-long bioinformatics workshop. The general meeting will include topics from de novo genome assembly and targeted sequencing to Iso-Seq full-length RNA sequencing and epigenomics, covering microbes, plants, animals, and humans. The bioinformatics workshop will focus on genome assembly and quality control, targeted sequencing and phasing, detection of minor variants or structural variants, and more. Learn more and sign up for the meeting.

Read More »

Wednesday, April 27, 2016

Upcoming Webinars on Biomedical Research, Data Analysis, and Structural Variation

We’ve got several educational webinars coming up, and we hope you can join us!

Our first event will be hosted by Front Line Genomics on April 28 (4:00 p.m. BST / 11:00 a.m. EST / 8:00 a.m. PST). “Applying PacBio Long-Read Sequencing for Human Biomedical Research” will include Adam Ameur of the National Genomics Infrastructure in Sweden; Giancarlo Russo from the Functional Genomics Center Zurich; and our CSO Jonas Korlach. Each participant will offer a brief presentation, with audience Q&A at the end.

dnanexus logoWe’ve also teamed up with DNAnexus to offer two webinars on best practices for SMRT Sequencing data analysis. The first, on May 4 (5:00 p.m. CET / 11:00 a.m. EST / 8:00 a.m. PST), features DNAnexus Computational Biology Project Leader Brett Hannigan discussing rapid assembly for reference-quality genomes. The webinar will include a look at the challenges involved in assembling the 4.5 Gb tobacco genome and walk through running the FALCON assembler on the DNAnexus platform.

The second webinar, focused on discovering structural variants in SMRT Sequencing data, will take place on June 16 (5:00 p.m. CET / 11:00 a.m. EST / 8:00 a.m. PST). Andrew Carroll, Director of Science at DNAnexus, will talk about using cloud-optimized apps such as PBHoney, Parliament, and Sniffles to improve the accuracy of calling structural variation.

All webinars will also be recorded. If you cannot attend in person, please sign up and we will send you the file following the event.

Read More »

Monday, April 25, 2016

On DNA Day Honoring Discoveries – Y chromosome, Reference Grade De Novo Assemblies & Methylation

98074_thumbHappy DNA Day, everyone! This scientific celebration has us reflecting on the many advancements the community has made in the past year. For a molecule that is sequenced thousands of times a day all over the world, there is still much to learn. Today we’d like to honor some of the remarkable science enabled by SMRT Sequencing since last year’s DNA Day.


Scientists have continued to make progress exploring regions of the genome that have long been considered intractable. Two of our favorite stories this year came from the always-challenging Y chromosome. Researchers studying the mosquitoes that carry malaria — Anopheles gambiae — delivered the first detailed analysis of their Y chromosome, which is essentially a giant string of repeat sequences. The information may prove essential for efforts to shift the sex ratio of mosquito populations toward males, which do not transmit disease. In a separate study, scientists analyzed the Y chromosome in Drosophila and found evidence of an ancient gene duplication from an autosome; the gene had since acquired a new function on the Y chromosome. The gene had never been discovered before because of its location in a highly repetitive, complex genomic region that was inaccessible to other sequencers.


We’ve also seen a number of great examples of reference-grade de novo genome assemblies in the past year. A large team of scientists produced what they called “the most contiguous clone-free human genome assembly to date” using SMRT Sequencing along with single-molecule genome maps from BioNano Genomics. A similar strategy was used to generate “a gapless telomere-to-telomere genome assembly” of the filamentous fungus Verticillium dahliae, according to the publication in mBio. Just recently researchers published a new assembly for the gorilla genome, representing better than 150-fold improvement over the previous assembly. We loved the story of Oropetium thomaeum, a resurrection grass that was sequenced for one of our SMRT Grant winners and resulted in a virtually complete assembly.


DNA methylation has been another area of interesting developments as scientists delve into this poorly understood genetic mechanism. A recent Joint Genome Institute project involved a sweeping analysis of 230 prokaryotes that revealed more methylation, and more complex patterns, than ever suspected. A separate study detected methylation for the first time in C. elegans, proving that even well-characterized organisms still have secrets to reveal. Scientists also made progress in understanding the role of epigenetics in virulence and antibiotic resistance; this study found an epigenetic switch in non-typeable Haemophilus influenza that alters the organism’s pathogenicity and drug resistance.


The past year has also been an exciting time for the PacBio team. In October, we launched our new Sequel System, a sequencer one-third the size and half the cost of the PacBio RS II with nearly seven-fold higher throughput. And right now we’re gathering votes for our first-ever community poll to award a new SMRT Grant. If you haven’t voted yet, now’s the time!

Read More »

Wednesday, April 20, 2016

Benchmarking Study:
Full-Length 16S Sequencing Offers Better Phylogenetic Resolution

Scientists from the Joint Genome Institute and other institutions recently reported a new SMRT Sequencing approach to microbial profiling using full-length sequencing of the 16S rRNA gene. In a benchmarking study, they demonstrate that this method allows for more accurate taxonomic classification than is possible with typical short-read sequencing methods.

Lead author Esther Singer, senior author Tanja Woyke, and collaborators at USDA-ARS, the University of British Columbia, and other research groups published “High-resolution phylogenetic microbial community profiling” in The ISME Journal earlier this year. The scientists note that while 16S phylogenetic analysis has traditionally been performed with gold-quality Sanger sequencing, the need for a more cost-effective solution drove the field to short-read sequencing technologies, which have produced most of the 16S sequences in GenBank. However, that shift came at the cost of quality. “Reference sequences with low read accuracy, chimeric sequences and partial rRNA gene sequences with reduced phylogenetic resolution generated on short-read sequencing platforms such as 454 and Illumina remain problematic, resulting in incorrect or less accurate classification of environmental sequences,” the authors report.

The team thought long reads from SMRT Sequencing could provide an appealing alternative. In this project, they generated full-length 16S sequences from microbial communities using a PacBio instrument and compared results to those from a short-read platform. They first tested the approach on a mock community of 26 bacterial and archaeal species including E. coli and strains of Salmonella and Clostridium, generating full-length 16S sequences called PhyloTags in a successful validation of the method.

Next they went to the field, using PacBio and short-read sequencing to analyze microbial communities from a lake in British Columbia, with water samples taken at eight different depths. They determined that partial sequences from the 16S gene — the information generated by sequencers that can’t cover the full gene in a single read — were less likely to resolve phylogeny and were more likely to lead to incorrect matches, particularly in more complex microbial communities. As many as 4% of short-read results “were taxonomically unresolved at the phylum level, whereas all PhyloTags were classified into distinct bacterial phyla,” the scientists report. In an analysis of unclustered sequence data, they note that short-read sequence results were “more often either impossible or incorrect, significantly altering community profiles across all taxonomic levels.” They also found that certain phyla were more likely to be misclassified when only partial gene coverage was available. “PhyloTag sequencing … offers the highest contig accuracy without discrimination against GC-rich or -poor regions, which further reduces bias in amplicon-based profiling,” the authors write.

“A resurgence of [full-length] sequences used as ‘gold standards’ has the potential to yet again transform microbial community studies, increasing the accuracy of taxonomic assignments for known and novel branches in the tree of life on previously unobtainable scales,” Singer et al. report.

Read More »

Monday, April 18, 2016

First Comprehensive Analysis of Mosquito Y Chromosome Offers Clues for Vector Control

pnasA new PNAS paper offers the first detailed analysis of the Anopheles gambiae Y chromosome, which could prove critical for biological and infectious disease research. The report uncovered extensive remodeling of the Y chromosome, which consists almost entirely of highly repetitive sequence. The authors say this study “provides a long-awaited foundation for studying male mosquito biology, and will inform novel mosquito control strategies based on the manipulation of Y chromosomes.”

Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes” comes from lead authors Andrew Brantley Hall, Philippos-Aris Papathanos, Atashi Sharma, and Changde Cheng, along with senior author Nora Besansky. This large collaboration combines scientific expertise from Virginia Polytechnic Institute and State University, the University of Notre Dame, NHGRI, Indiana University, and several other institutes. The project was formed to address the lack of information about the mosquito Y chromosome, which has hindered vector-control efforts. Previous sequencing initiatives had analyzed A. gambiae but reported only 180 kb of unordered sequence data for the Y chromosome; a related mosquito genome project revealed 57 short sequences, while some 200 kb of sequence data was generated from BAC clones for the chromosome.

The challenge lies in the highly repetitive sequences found in Y chromosomes across organisms. “The Y chromosome remains one of the most recalcitrant and poorly characterized portions of any genome more than a decade into the postgenomic era,” the authors write. Among mosquitoes, the Y chromosome is particularly interesting because males do not transmit disease, so shifting the sex ratio of populations is a promising vector-control approach for reducing the incidence of malaria, Zika virus, and other mosquito-borne disease.

The team used SMRT Sequencing to tackle the A. gambiae Y chromosome, first generating a 294 Mb de novo assembly followed by sequencing and completely assembling BAC clones. “We find that the A. gambiae Y consists almost entirely of a few massively amplified, tandemly arrayed repeats, some of which can recombine with similar repeats on the X chromosome,” the scientists report.

For further analysis, the scientists incorporated genome resequencing data from a recent species radiation and determined that the Y chromosome experiences rapid sequence turnover. They also used RNA-seq data to identify a small number of genes on the chromosome that had no homologs on the X chromosome. In addition, they found YG2, a conserved gene that may have a role in sex determination in the mosquito.

The authors note that SMRT Sequencing has been a game-changing development for analyzing Y chromosomes in many organisms, “promising a resource-efficient alternative” to the laborious processes used in the past. “Single-molecule sequencing reads were able to reveal complex repeat structures from whole-genome data and completely assemble heterochromatic BACs without manual finishing,” the scientists conclude. “These results suggest that continued single-molecule read length and throughput improvements may soon enable the complete reconstruction of Y chromosomes from whole-genome data alone.”

Read More »

Friday, April 15, 2016

Japanese Scientists Find Gene Fusion Driving B Cell Leukemia

In a new Nature Genetics paper, scientists from the University of Tokyo and several other Japanese institutes and hospitals present results of a sweeping study of gene fusions driving a form of leukemia in teenagers and young adults. They used SMRT Sequencing to validate the gene fusion.

Recurrent DUX4 fusions in B cell acute lymphoblastic leukemia of adolescents and young adults” comes from lead author Takahiko Yasuda and senior author Hiroyuki Mano, along with many collaborators. The team embarked on the search for new oncogenes responsible for acute lymphoblastic leukemia (ALL) in subjects from 15 to 39 years of age because the mechanisms responsible for this cancer “remain largely elusive,” they write.

From a large RNA-seq analysis, they found frequent insertion of a D4Z4 repeat that includes the DUX4 gene into the IGH locus, creating a DUX4-IGH gene fusion that produces high expression levels of an aberrant form of DUX4. The scientists transplanted this gene fusion into mice, where it led to the generation of B cell leukemia. They report that fusion-driven oncogenes are more important for causing ALL in this age range than previously thought. “Our data thus show that DUX4 can become an oncogenic driver as a result of somatic chromosomal rearrangements and that [ALL in adolescents and young adults] may be a clinical entity distinct from ALL at other ages,” Yasuda et al. write.

The team used SMRT Sequencing to confirm the full sequence of the gene fusion, which could not be done with short-read sequencing. “Given that the average read length in our next-generation sequencing approach was 104 bp, it was difficult to determine how many copies of DUX4 had been inserted into the IGH locus,” they report. They performed whole genome sequencing of a B cell line cultured from a 19-year-old ALL patient, generating about 69 Gb of data.

“Our analysis confirmed that one full-length and one partial copy of D4Z4 were translocated to the IGH locus, accompanied by minor rearrangements within the IGHD2-15 and IGHVII-60-1 regions,” the scientists report. This figure shows SMRT Sequencing data confirming the presence of the DUX4-IGH gene fusion.

To learn more about applying SMRT Sequencing to cancer research, check out our AACR conference preview.

Read More »

Thursday, April 14, 2016

Accelerating Cancer Research Discovery with SMRT Sequencing at AACR 2016

The PacBio team is gearing up for the annual meeting of the American Association for Cancer Research (AACR), which will be held April 16-20 in New Orleans. We’re looking forward to introducing the AACR community to the Sequel System, our new SMRT Sequencing platform that’s half the price and a third of the size of our PacBio RS II System. With 5-10 Gb of throughput per SMRT Cell, we think the Sequel System will be a great fit for the cancer research world.



Sunday, April 17, 4:15 – 6:15 p.m., Room 243, Morial Convention Center

MS.BSB01.01. Minisymposium: Novel and Integrative Analyses of Cancer Genome Data

Sun, Apr 17, 4:50 – 5:05 p.m., Room 243, Morial Convention Center
848 – Proteogenomic Analysis of Alternative Splicing: The Search for Novel Biomarkers for Colorectal Cancer

Malgorzata Komor, Netherlands Cancer Institute

Sun, Apr 17, 5:20 – 5:35 p.m., Room 243, Morial Convention Center
850 – Comprehensive Genome and Transcriptome Structural Analysis of a Breast Cancer Cell Line using Single Molecule Sequencing

Maria Nattestad, Cold Spring Harbor Laboratory (don’t miss our case study about this project!)


We’re also hosting four Meet the Expert sessions at our booth (#257) where attendees will hear quick presentations on targeted sequencing or the Iso-Seq method for transcriptome studies, followed by a Q&A. Our experts Roberto Lleras and Anand Sethuraman will be available on Monday the 18th and Tuesday the 19th.

  • Full-Length Isoform Sequencing: Monday 1:00 – 1:30 p.m.; Tuesday 2:30 – 3:00 p.m.
  • Targeted Sequencing with Long Reads: Monday 2:30 – 3:00 p.m.; Tuesday 1:00 – 1:30 p.m.

Finally, there will be a number of posters demonstrating the use of SMRT Sequencing for cancer research or analysis tools to help users make the most of their PacBio data. Here’s a quick peek at a few:



Sun, Apr 17, 1:00 – 5:00 p.m.
LB-012/12 – Autonomous, antigen-independent B-cell receptor signaling as a novel pathogenetic mechanism in non-GCB DLBCL
Marvyn T. Koning et al., Leiden University Medical Center

Tue, Apr 19, 8:00 a.m. – 12:00 p.m.

3438/12 – DNA methylation profiles of Helicobacter pylori strains from patients with gastric cancer and gastritis

Constanza Camargo et al., National Cancer Institute

Tue, Apr 19, 1:00 – 5:00 p.m.

3646/16 – Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing

Anand Sethuraman et al., PacBio

3611/10 – SMRT® Sequencing of DNA samples extracted from formalin-fixed and paraffin embedded tissues

Primo Baybayan et al., PacBio

LB-286/3 – Dynamic alternative splicing correlates with drug synergy and induces novel gene regulatory networks in MCF7
Xintong Chen et al., Icahn School of Medicine at Mount Sinai

Wed, Apr 20, 8:00 a.m. – 12:00 p.m.
5281/18 Fast and scalable software for comparative variant analysis and visualization of massive next-generation sequencing data
Riku Katainen et al., University of Helsinki


We look forward to seeing you in New Orleans!

Read More »

Wednesday, April 13, 2016

Genome and Transcriptome Analysis Help Scientists Deconstruct Cancer Complexity

At Cold Spring Harbor Laboratory, scientists used SMRT® Sequencing to decode one of the most challenging cancer genomes ever encountered. Along the way, they built a portfolio of open-access analysis tools that will help researchers everywhere make structural variation discoveries with long-read sequencing data.

When Mike Schatz realized a few years ago that his PacBio® System had reached the throughput needed to process human genomes, he decided to give it a real challenge: the incredibly complicated, massively rearranged SK-BR-3 breast cancer cell line. The genome consists of 80 chromosomes, and that’s just the tip of the complexity iceberg.

“We were really interested in sequencing a human genome that would be maximally impactful and that was aligned with our research interest in cancer genomes, where it’s been well documented that structural variations play a major role,” says Schatz, now an associate research professor of computer science at Johns Hopkins University and an adjunct associate professor of quantitative biology at Cold Spring Harbor Laboratory, where the analysis took place. He notes that despite its importance, structural variation has not been thoroughly studied because short-read sequencers cannot reliably identify these large genomic elements. “One of the really special properties about the PacBio Sequencer is, in addition to being able to call SNPs or small variants, we also get to look for large variants such as structural variation,” he says.

But as Schatz and his collaborators at Cold Spring Harbor Laboratory and the Ontario Institute for Cancer Research delved into this work, they realized that existing variant callers were tailored to short-read data. To make the most of the large amount of long-read information they were generating, the team wrote a suite of new analysis tools optimized for SMRT Sequencing data. “The tools catering to short-read data just aren’t made to capture the awesome information that we can now take advantage of,” says Maria Nattestad, a graduate student in Schatz’s lab who wrote several of the new algorithms. “Building our own tools was really the only way to go here.”

Those tools, which are especially important for understanding structural variation, are now being publicly released to fuel further SMRT Sequencing studies of human genomes. Also coming out soon is the team’s detailed analysis of the SK-BR-3 genome and transcriptome, which includes a high-quality assembly as well as a new understanding of gene fusions, the evolutionary history of this cell line, and more.

De novo sequencing and assembly were the first steps in making sense of the SK-BR-3 genome. With 72-fold SMRT Sequencing coverage, “we got an outstanding assembly of this genome even though it’s so complicated,” Schatz says, citing a contig N50 size of 2.5 Mb compared to a state-of-the-art short-read assembly with a contig N50 of just 3 kb. “That’s nearly a thousand-fold more contiguous going from short-read to long-read assemblies, and it’s through that improved assembly that the majority of structural variants were detected.”

Using custom-built analysis tools, including variant callers Sniffles, by Schatz lab member Fritz Sedlazeck, and Assemblytics, by Nattestad, the scientists found more than 10,000 structural variants in the SK-BR-3 genome ranging in size from 50 bases to millions of base pairs long. Another major discovery involved meticulously characterizing the complicated process that led to the cell line’s Her2 oncogene amplification.

The team also used the Iso-Seq™ method to analyze the full transcriptome of SK-BR-3, finding as much complexity at the RNA level as they saw in the DNA. “In the Iso-Seq analysis, we see many tens of thousands of novel isoforms,” Schatz says. “That’s a really strong testament to the long reads, which fully capture an isoform in one sequence — unlike short reads, where you have to infer isoform structure.”

To learn more about the project, which included novel findings about gene fusions in cancer, check out the full case study.

Read More »

Monday, April 11, 2016

From Earthworms to Alpacas: Vote Now to Choose the Next SMRT Grant Program Winner!

BlogPhoto3For the first time ever, the winner of this year’s “Explore Your Most Interesting Genome” SMRT Grant program will be decided by the community. We’ll be using our new Genome Galaxy Initiative and Experiment’s dedicated-to-science crowdfunding platform for this worldwide event.

Here’s how it works: our top five finalists will be engaging with you directly through their project pages on the Genome Galaxy Initiative via Experiment where you will have the opportunity to learn more and ask scientists about their projects. We will be conducting daily polls so you can cast your vote for the project you feel should be supported by the SMRT Grant program (see FAQ). The four runners-up will then have a second chance at seeing their projects kick-off through promotion and public donations on Experiment.

Voting starts today and remains open until May 1. Brief descriptions of the projects follow. Who will you vote for today?


Project: The Amazing & Enigmatic Alpaca

Investigator: Kylie Munyard, Curtin University

According to Munyard’s proposal, the economically important alpacas are of great scientific interest on a number of fronts, and producing a reference genome will enable new studies in both agricultural and biomedical research. Alpacas are a good model for diabetes research; they have innate mechanisms to stay free of parasites; their distant relationship to other agricultural animals makes them good for comparative study; and much more.


Project: Plant Heroes for Remediation of Soils Contaminated with Heavy Metals

Investigator: Renying Zhuo, Chinese Academy of Forestry

With this project, Zhuo aims to produce high-quality genomes for two strains of Sedum alfredii: one heavily accumulates cadmium ions from polluted soil while the other doesn’t, although both are found in the same ecosystem. Scientists hope to improve on fragmented short-read assemblies and use comparative genomics to understand the plant’s mechanism for processing heavy metals, with the ultimate goal of using this information for remediating contaminated soil.


Project: Sequencing an Extremophile Earthworm

Investigator: Luis Cunha, Cardiff University

This project would sequence the extraordinary earthworm Pontoscolex corethrurus, which lives in a volcanic geothermal field with high exposure to toxic gases, extreme temperatures, and very little oxygen. Cunha’s proposal notes that preliminary work with draft assemblies indicates significant levels of horizontal gene transfer that could be better characterized with SMRT Sequencing.


Project: Scar-Free Regeneration in the Spiny Mouse

Investigator: William Barbazuk, University of Florida

According to this proposal, the adult spiny mouse is the only known mammal with the unique ability to regenerate skin and organs after wounds without any scars or other indications of trauma, making this organism interesting for regenerative medicine. Barbazuk hypothesizes novel genes, alternatively-spliced isoforms, and gene expression regulators are responsible. He aims to use SMRT Sequencing to study the transcriptome of spiny mouse and its wound-healing properties.


Project: Highlighting Firefly: A Genome Resource

Investigator: Jing-Ke Weng, MIT

This project would help a large consortium of researchers generate a high-quality genome assembly for Photinus pyralis, an American firefly. Weng’s proposal notes that the 2,000-plus species of these charismatic flashing beetles have been understudied, and that the biological mechanisms behind important traits such as bioluminescence remain unknown.


We thank our co-sponsors for their support of this event: Sage Science, Computomics, Experiment, and RTL Genomics.

Read More »

Thursday, March 31, 2016

With Greater Contiguity, New Gorilla Genome Assembly Offers Insights into Gene Content, SVs, and More

800px-WesternLowlandGorilla05In a Science paper published today, scientists from the University of Washington, the McDonnell Genome Institute, and other organizations present a new gorilla genome assembly generated with PacBio long-read sequencing, representing an over 150-fold improvement over previous assemblies.

From lead authors David Gordon, John Huddleston, Mark Chaisson, and Christopher Hill, and senior author Evan Eichler, the paper reports that the new assembly recovers nearly all reference exons missing from the previous assembly, and provides an unprecedented look at structural variation, genetic diversity, ancestral evolution, repeat structures, and more.

The project was launched to address shortcomings with the existing gorilla assembly, which was built with short-read and Sanger sequencing data. While short-read sequencing has been instrumental for genomics, the authors write, “assemblies have become increasingly more incomplete and fragmented in large part because the underlying sequence reads are too short (<200 bp) to traverse complex repeat structures. This has led to incomplete gene models, less accurate representation of repeats, and biases in our understanding of genome biology.” The previous gorilla assembly was highly fragmented, with more than 400,000 gaps, and had been assembled using the human genome as a guiding reference.

The team used SMRT Sequencing on a western lowland gorilla named Susie, followed by assembly and polishing with FALCON and Quiver, respectively. The resulting assembly size is 3.1 Gb, with a contig N50 length of 9.6 Mb. The assembly closes 93% of the gaps, many of which are characterized by GC-rich content, and provides at least 148 Mbp of additional euchromatic sequence.

The scientists incorporated additional genome data from six gorillas, generating a reference genome called Susie3. A gene content analysis determined that nearly 95% of RefSeq exons missing from the original assembly were recovered in this assembly, and that 96% of previously incomplete gorilla genes were represented in at least one isoform. They also looked at structural variation, finding that 86% of the indels and inversion variants detected had never been seen before. “These analyses provide a comprehensive catalog of mobile element differences between human and gorilla (24.1% of all structural variation events),” the authors report.

The assembly also suggests that previous estimates of evolutionary divergence and population sizes were not as accurate as expected. “Although the difference was subtle, we found that human versus gorilla sequence alignments were significantly less divergent with Susie3 (1.60% divergent) when compared to the published gorilla assembly (1.65% divergent),” the scientists write. “We found a strong correlation with the difference in divergence and regions enriched for Alu and G+C content … suggesting that mismapping, collapse or underrepresentation within these regions of the Illumina-based assemblies may be contributing to this excess of divergence.” They also report that previous estimates of the most recent population bottleneck for western lowland gorillas “may have been underestimated by a factor of ~1.5, highlighting the importance of using higher quality assemblies when fitting demographic models.”

The scientists note that SMRT Sequencing has put high-quality de novo mammalian assemblies within reach of individual labs. “Our results demonstrate the utility of long-read sequence technology to generate high-quality working draft genomes of complex vertebrate genomes without guidance from preexisting reference genomes,” they conclude. “The genome assembly that results from using the long-read data provides a more complete picture of gene content, structural variation and repeat biology as well as allows us to refine population genetic and evolutionary inferences.”

This exciting advance was also presented by Christopher Hill at AGBT — check out the recording of his presentation.

Read More »

Wednesday, March 30, 2016

New Study Uses SMRT-ChIP Method to Find Novel Methylation in Mouse Embryonic Stem Cells

In a new Nature publication, scientists from Yale and other institutions report the discovery of N6-methyladenine (N6-mA) in mouse embryonic stem cells (ESCs), contrary to the conventional wisdom that the only form of methylation in mammals is 5-methylcytosine. Through the project, the team also developed a new method for pairing chromatin immunoprecipitation (ChIP) with SMRT Sequencing. Both of these developments have significant implications for the genomics community.

DNA methylation on N6-adenine in mammalian embryonic stem cells” comes from lead author Tao Wu and senior author Andrew Xiao, both at Yale School of Medicine. The team also included collaborators from the University of Arkansas for Medical Sciences, the University of North Carolina, the Icahn School of Medicine at Mount Sinai, and PacBio. “The discovery of N6-mA in mammalian ES cells sheds new light on epigenetic regulation during early embryogenesis and may have impacts in the fields of epigenetics, stem cells and developmental biology,” Wu et al. write.

To conduct this study, the team developed a SMRT-ChIP method to study DNA modifications at specific histone variant regions. The SMRT Sequencing data demonstrate the presence of N6-mA at nearly 400 sites in the genomic regions studied, a finding that was confirmed with mass spec analysis. The team focused on the H2A.X deposition, which has been associated with cell fate transitions, sequencing the enriched, unamplified DNA from those regions. They also compared SMRT-ChIP results to those from DIP-seq, an orthogonal method, and found strong concordance.

The scientists identified Alkbh1 as the demethylase that regulates adenine methylation and went on to create cell lines with this gene knocked out, showing that N6-mA levels increased by a significant degree without the demethylase. The team also used Alkbh1 to shed light on how these N6-mA sites function; in the knockout cells, the expression of 550 genes was downregulated compared to the original cell line. That contrasts with other recent discoveries of N6-mA in organisms including C. elegans and D. melanogaster. “Intriguingly, [those] studies implicated N6-mA in gene activation, instead of repression, as is the case for 5mC repression,” the scientists write.

They also report a strong location bias for the methylated sites, with the greatest enrichment on the X chromosome. “N6-methyladenine deposition is inversely correlated with the evolutionary age of LINE-1 transposons; its deposition is strongly enriched at young (<1.5 million years old) but not old (>6 million years old) L1 elements,” the authors write, noting that young L1s are important in the beginning of embryogenesis. “We favour the view that N6-mA-mediated silencing plays an important role in safeguarding active L1 elements in mammalian genomes. The levels of N6-mA are controlled precisely by Alkbh1 in ES cells such that they favour L1 transcription while preventing it from succumbing to overactivation and genomic instability.”

Read More »

Thursday, March 24, 2016

CSHL Scientists Discuss Long-Read Sequencing for More Contiguous Assemblies and Complex Genomes

Much like the “sharpen” tool in Photoshop brings a picture into tighter focus and enhances the fine detail, long-read sequencing offers enhanced resolution of genomic information, according to Cold Spring Harbor Laboratory colleagues Mike Schatz and Maria Nattestad. The scientists spoke with Mendelspod’s Theral Timpson about how long-read sequencing is advancing their research in unique and powerful ways; a brief recap of their conversation follows.

Schatz uses PacBio sequencing to establish incredibly accurate assemblies of microbial, crop, animal, and human genomes. Indeed, SMRT technology has significantly improved his work on the flatworm Macrostomum lignano, an organism with regenerative powers. With only a few reference genomes and limited functional studies available, the flatworm proved to be particularly challenging to sequence with short-read solutions. “We were quite frustrated by the results that we were getting, where the assembly was of very poor quality,” Schatz says. “It was also missing something like half of the genome that we expected to be there; it just wasn’t present at all in the assembly that took place.” At this point, the team realized that long reads would help them achieve a much improved reference genome. By collaborating with algorithm developers, PacBio, and the NIH, the team created an assembly that was about 100 times more contiguous than assemblies based on short-read data.

Long reads also appeal to Nattestad, who is using de novo assembly of the SK-BR-3 breast cancer cell line as a way to fully characterize not just SNPs, but also major structural variations. One of her interests in SK-BR-3 is to better understand Her2 oncogene amplification, and she has undertaken a historical, step-by-step reconstruction of its mutations using software she developed for that purpose. “Our focus here is not just to see how many copies of Her2 there are, or to see that it is Her2-amplified like you would in a diagnostic setting. Instead, we wanted to see how that amplification has happened over time in the genome, and try to reconstruct a history of steps that took place,” she says. Schatz notes that in SK-BR-3, the region around Her2 has undergone what they call ‘genome gymnastics,’ a very complicated series of amplifications, inverted duplications, and translocation events. He says that “trying to capture that level of complication and sophistication just from standard variant calling approaches is very challenging.” Nattestad plans to follow up with analyses of other oncogenes known to be amplified in this cell line.

This year, Schatz expects to see a number of reference-grade human genomes published using PacBio technology to create high-quality de novo assemblies. He says, “If you’re interested to do a de novo assembly of an entirely novel species, my strong recommendation — without any hesitation — is to do long-range PacBio sequencing, and I would advocate for 100x coverage of the longest reads you can possibly generate. … This will give you the most successful assembly.” Structural variation studies are similar, he says: “You really want to use the long-read technology in order to capture those structural variations as accurately as possible.”

Read More »

Tuesday, March 15, 2016

Genome Galaxy Initiative:
On a Mission to Sequence the Beautiful and Mysterious Kākāpō

Photo courtesy of Andrew Digby, DOC New Zealand

Photo courtesy of Andrew Digby, DOC New Zealand

New Zealand is more than an amazing vacation destination or the setting of the Lord of the Rings movies; it’s also home to a wealth of fascinating species that evolved in isolation for millions of years. The critically endangered kākāpō bird is one such species, and it needs your help now.

David Iorns, a native New Zealander and founder of the Genetic Rescue Foundation, has launched a crowdfunding campaign to raise money and pursue a grand vision: saving kākāpōs from extinction. With a high-quality genome already underway using SMRT Sequencing, Iorns wants to resequence all remaining 125 kākāpōs. If funded, this project would be the first to digitally capture the genetics of every extant member of a species. Iorns hopes the data generated from these resequencing efforts, combined with the high-quality reference genome needed for interpretation, will help scientists better understand the genetic diversity of kākāpōs and ultimately prevent this species from going extinct.

Kākāpōs are part of the parrot family, but they’re nocturnal, flightless, and very heavy, making them unlike most of their better-known cousins. “They’re special and they’re worth saving,” Iorns says. Because the population has dwindled to so few members, conservation and breeding efforts have proven challenging. Sequencing every bird will provide an “incredibly rich genetic dataset that will help us get to the bottom of some of these fertility and genetic bottleneck-related problems,” he adds.

This project turned to PacBio’s Genome Galaxy Initiative on Experiment, a scientific crowdfunding platform, for public support. Iorns, who previously used crowdfunding to sequence the extinct moa bird, says he gravitated to Experiment’s platform because he’s a citizen scientist who doesn’t follow traditional research funding methods. For the kākāpō project, he hopes to raise $45,000 – you can be a part of this effort and follow this scientific expedition with a small contribution to the cause.

PacBio launched the Genome Galaxy Initiative to help support researchers looking for alternative ways to fund research breakthroughs propelled by SMRT Sequencing. Even a small donation can help scientists begin to address critical, underfunded issues. All donations are refunded if projects do not reach their funding campaign goals. We’re excited to see growing support for this kākāpō project and look forward to many more Genome Galaxy Initiative proposals to come.

Read More »

Thursday, March 3, 2016

Prevalent Methylation in Prokaryotic Genomes Suggests Regulatory Functions

A new publication from scientists at Lawrence Berkeley National Laboratory, the Joint Genome Institute, and other organizations reports a landmark study of genome-wide methylation in prokaryotes. The analyses of 230 bacteria and archaea species revealed both more methylation than expected and novel epigenetic mechanisms.

“­­­The Epigenomic­­­ Landscape of Prokaryotes” from lead author Matthew Blow, senior author Richard Roberts, and collaborators was recently published in PLoS Genetics. The team used SMRT Sequencing to detect 6-methyladenosine (m6A), 4-methylcytosine (m4C), and 5-methylcytosine (5mC) across the 230 genomes. “Bisulfite sequencing has enabled genome-wide surveys of 5mC methylation, but a historic absence of tools for studying m6A and m4C modifications that predominate in prokaryotic DNA has precluded more comprehensive studies,” the authors write, noting that the unique ability of SMRT Sequencing to capture all of these methylation states made a much more comprehensive study possible for the first time.

The authors reported widespread methylation in these genomes, with 93% of organisms harboring at least some methylated DNA. The scientists went on to identify methylated motifs, finding more than 800 distinct patterns, and also annotated the binding specificities of the 600+ methyltransferases detected. Of particular interest were the evolutionarily conserved orphan methyltransferases — or Type II methyltransferases with no obvious restriction enzyme — found in nearly half of all prokaryotes analyzed. Overall, these findings suggest that methylation has an important role in genome regulation for these organisms in addition to the well-established function of genome protection.

The team sequenced prokaryotes to an average 130X coverage, generating a total of 105 Gb of sequence data across all organisms. They report an average of three methylated motifs per organism, with m6A methylation accounting for 75% of all base modifications observed. “SMRT sequencing offers a powerful approach to determine the recognition specificities of several Types of [restriction-modification] systems that have previously been very difficult to decipher,” Blow et al. write. “Type I RM systems cleave DNA at large distances from their binding site, while both Type IIG and Type III systems sometimes have difficulties in producing complete cleavage patterns. This can make them difficult to study using traditional approaches that rely on analysis of patterns of restriction digestion.”

Novel restriction-modification systems as well as new forms of existing systems, including Type IIG systems, were discovered throughout the data set, suggesting alternative functions including genome regulation. The scientists also found evidence of methylation pattern conservation. “Given the extensive amount of methylation present in the majority of the genomes we have examined, it is tempting to believe that methylation is a very important modification of bacterial and archaeal DNA perhaps providing regulatory functions that we have yet to fully appreciate,” the team reports. “Additionally, it is reasonable to assume that the evolution of DNA methylation was an early event that was important for the viability of primitive organisms.”

If you like JGI studies as much as we do, don’t miss the institute’s user meeting starting on March 21st.

Read More »

Monday, February 29, 2016

On Rare Disease Day, Celebrating the Contributions of Genomics

Today we are celebrating Rare Disease Day with like-minded folks all over the world. The tribute kicked off in 2008 and has gathered so much momentum that people in more than 80 countries are expected to participate in 2016. Each disease is rare — affecting fewer than 1 in 1,500 people — but because there are so many of these diseases, together they affect millions of people globally.

Here at PacBio, many of our team members have their own stories about dealing with rare disease, and we imagine the same is true of our blog readers. We’re so proud that leading scientists have already begun using SMRT Sequencing to make important new DNA and RNA discoveries about the genetics and disease mechanisms of rare diseases. In the future, we anticipate even more of these studies will lead to novel breakthroughs as scientists expand their use of PacBio sequencing for human disease studies. Together, we can have a real impact in helping families struggling with these diseases.

Here are some examples of how researchers have shed light on rare diseases with SMRT Sequencing:

Baylor’s Jim Lupski, who studies and has been diagnosed with Charcot-Marie-Tooth neuropathy, recently spoke about a de novo PacBio assembly of his genome that found much more structural variation — especially copy number changes — than previous assemblies from short-read data. He also described how long reads are able to better resolve and characterize break points associated with these disease-causing structural variants, and also resolve sequence context to provide base-level resolution of specific genotypes.

In a separate presentation, Richard Gibbs from Baylor College of Medicine noted that just 25% of Mendelian disorders have been solved with short-read sequence data, and suggested that the success rate may be limited by the inability of these platforms to detect structural variation, repeat regions, and complex events. With SMRT Sequencing and structural variation analysis algorithms created at his genome center, scientists may be able to uncover the genetic basis of many more Mendelian disorders using low-coverage, long-read PacBio sequencing.

Paul Hagerman from the University of California, Davis, led the first team in the world to completely sequence a fully expanded pathogenic ‘CGG repeat allele’ in the FMR1 gene on the X chromosome that is associated with Fragile X Syndrome. Previously thought to be “unsequenceable,” PacBio sequencing of repeat expansions in the FMR1 gene is shedding new light on pathogenic variants and interruptions that are meaningful for screening and carrier counseling, and that may lead to improved diagnostic and intervention strategies for families affected by Fragile X syndrome.

In a related project, follow-up work from Flora Tassone and other UC Davis researchers applied the Iso-Seq method to characterize alternative splicing in the FMR1 gene for a different disorder called Fragile X-associated tremor/ataxia syndrome (FXTAS). They found differential expression for certain gene isoforms suggesting a functional relevance for these in the pathology of FMR1-associated disorders.

Scientists in North Carolina generated the first high-quality sequence of MUC5AC, a gene that has been implicated in a range of diseases, including cystic fibrosis. The gene had long been represented as a gap in the human reference genome because of its complex and highly repetitive central exon. Characterization of the MUC5AC gene and the sequence variation in the central exon will facilitate genetic and functional studies for this critical airway mucin.

In a recent talk at AGBT, Bobby Sebra from the Icahn School of Medicine presented results from the recent targeted PacBio sequencing of the C9orf72 loci, which contains a GGGGCC repeat expansion now known to cause familial ALS (also known as Lou Gherig’s Disease). He presented sequencing data from both the PacBio RS II platform and the new Sequel System, showing the ability to fully characterize the sequence of this locus and provide novel insights into the genetics underlying this debilitating disease.

Tetsuo Ashizawa and Karen McFarland from the University of Florida are making progress understanding the genetics of spinocerebellar ataxia type 10 (SCA10). In a recently published study, they describe sequencing through a pentanucleotide repeat allele known to cause this disorder, and characterizing various repeat interruption motifs associated with different SCA10 clinical phenotypes.

Shinichi Morishita’s lab at the University of Tokyo has described similar methods for characterizing tandem repeats associated with the SCA31 brain disease using a hybrid long- and short-read approach.

At Stanford University, Ayal Hendel is working in collaboration with John Day and the Myotonic Dystrophy Foundation to study the CTG/CAG repeat tracts that represent the genetic basis for myotonic dystrophy type 1 (DM1), and explore the cellular and molecular pathological mechanisms involved in DM — including aberrant alternative splicing.

We’d like to congratulate these scientists, along with all the others around the world who are working hard to make a difference in the lives of people burdened by rare disease. Whether you’re using our technology or any other, we thank you and wish you all the best!

PacBio is proud to be an official partner of Rare Disease Day. Get involved with global efforts or US-based initiatives to honor those dealing with rare diseases.

Read More »

Thursday, February 25, 2016

New Views of Microbial Communities Call for Updates to Infectious Disease Tenets


Robert Koch

In a perspective recently published in Science magazine, scientists Allyson Byrd and Julie Segre from the National Human Genome Research Institute used recent advances in microbial analysis to look at Koch’s postulates through a new lens.

Published by Robert Koch in 1890, these principles have become widely accepted in microbiology as the definitive means to prove that a specific pathogen is the cause of an infectious disease. As summarized by Byrd and Segre, the postulates dictate that: “First, the microorganism occurs in every case of the disease; second, it is not found in healthy organisms; and third, after the microorganism has been isolated from a diseased organism and propagated in pure culture, the proposed pathogen can induce disease anew.”

The authors point out that Koch lived long before the discovery of antibiotics and nucleic acids, noting that recent revelations in infectious disease research call for an update of these principles — specifically, the role of microbial communities in causing or preventing disease. “In light of recent appreciation of microbial consortia, the scientific community should consider infectious disease causation in a broader systems biology context in which host genetic variability, health status, past exposure history, and microbial strains and communities are all important,” Byrd and Segre write. “As technology advances and new scientific discoveries are made, we must dynamically adapt Koch’s postulates so today’s science maintains the integrity that Koch originally fostered.”

The authors review several recent infectious disease papers, noting that hospital-acquired infections and microbiome studies have both shed new light on the association between microbes and disease. For instance, certain commensal microbes appear to protect against infections like those caused by Clostridium difficile, Salmonella, and other pathogens. Microbes can also work together to prevent infection, as seen in recent work demonstrating that a six-member microbial community ameliorated the effects of C. difficile infection, the authors report. “These findings force us to consider under what circumstances a consortium of microbes can fulfill Koch’s postulates,” they add. “For example, do all members of the community have to be grown in pure culture and tested individually, or is it sufficient to grow and test a group culture?”

Byrd and Segre note that new sequencing technologies have made it possible to study and analyze microorganisms that cannot be cultured. They also recommend updates to Koch’s postulates that would expand the rules to cover microbial communities.

SMRT Sequencing provides a high-resolution view that allows scientists to interrogate microbes — both individually and in communities — with greater accuracy and completeness than has ever been possible before. It’s an honor to see so many PacBio users delivering new insights that surely would have made Koch proud.

Read More »

Subscribe for blog updates: