We’re proud to announce the release of the most contiguous diploid human genome assembly of a single individual to date, representing the nearly complete DNA sequence from all 46 chromosomes inherited from both parents. The sample used was derived from a Puerto Rican female who has been included in population genetics studies such as the 1000 Genomes Project. The phased diploid assembly will give unprecedented views of population-specific variation through the long-range resolution of maternal and paternal haplotypes. This work is part of a larger effort in the field of personalized medicine and human genomics to add ethnic diversity to…
We’re pleased to release a new data set along with an allele phasing GitHub software workflow for those interested in exploring SMRT Sequencing data from an Alzheimer’s disease candidate gene study. Our team collaborated with Integrated DNA Technologies (IDT) to design a 35-gene panel targeting candidate Alzheimer’s disease genes identified as potential genetic risk loci across many GWAS and linkage studies. Long-read PacBio sequencing was applied to brain and skeletal tissue from two individuals diagnosed with Alzheimer’s disease and a wide range of variants were detected, from SNPs to indels, and larger structural variations up to several kilobases in size.…
Anna’s hummingbird photo by Pat Durkin If you’re interested in avian vocal learning or want to explore a PacBio Iso-Seq data set generated with the Sequel System, we have good news. We’ve just released data from Iso-Seq interrogations of brain tissue from two avian models of vocal learning, Anna’s hummingbird (Calypte anna) and zebra finch (Taeniopygia guttata), sequenced in collaboration with the Erich Jarvis and Olivier Fedrigo labs at the Rockefeller University. If you’re not familiar with the Iso-Seq method, it’s the long-read sequencing answer to short-read RNA-seq studies. By using SMRT Sequencing for a transcriptome project, scientists can generate…
Recent de novo assemblies of individual human genomes have uncovered thousands of structural variants, many of which are accessible only with PacBio long reads [1-3]. Personal Genome PacBio Coverage Deletions ≥50 bp Insertions ≥50 bp CHM1 [1] 41-fold 6,111 9,638 HX1 [2] 103-fold 9,891 10,284 AK1 [3] 101-fold 7,358 10,077 A similar increase in structural variant sensitivity relative to short-read methods has been demonstrated with low-fold coverage PacBio sequencing interpreted against the reference genome [4]. To demonstrate and evaluate the low-fold coverage approach on the PacBio Sequel System, we generated approximately 10-fold coverage of the well-studied human sample NA12878.…
Today we are pleased to release the first Arabidopsis thaliana (Ler-0) dataset and de novo genome assembly generated with the Sequel System, using two SMRT Cells and 12 hours of runtime. Only three years ago, we released our first genome assembly1 for Arabidopsis produced on the PacBio RS II using P4-C2 chemistry, 85 SMRT Cells and 255 hours of runtime. Four months later, we released a second Arabidopsis dataset1 using the improved P5-C3 chemistry, which reduced the number of SMRT Cells to 46 and runtime to 138 hours. We produced this Sequel dataset using our latest chemistry enhancements which significantly…
NOTE: This blog was updated with the latest data from the Sequel II System on March 9, 2020. Alzheimer’s disease (AD) is a devastating neurodegenerative disease that affects ~44 million people worldwide, making it the most common form of dementia. Pathologically it is defined by severe neuronal loss, aggregation of amyloid β (Aβ) in extracellular senile plaques in the brain, and formation of intraneuronal neurofibrillary tangles consisting of hyperphosphorylated tau protein. Studies looking into disease mechanism have shown that changes in gene expression due to alternative splicing likely contribute to the initiation and progression of AD. Hence, efforts have been…
As part of our effort to support the National Institutes of Health and the Genome Reference Consortium (GRC) in creating platinum genomes for the research community and improving the reference genome, in 2014 we generated 54X SMRT® Sequencing coverage of the CHM1 cell line, derived from a human haploid hydatidiform mole, using our P5-C3 chemistry, and made it publicly available through the SRA database at NCBI. The CHM1 dataset was quickly taken up by researchers eager to use long, unbiased reads to identify regions of the genome prone to structural variation and to fill in sequence gaps in the GRC-maintained…
UPDATE: Our R&D team has added a new dataset for the MCF-7 human breast cancer transcriptome, originally released in 2013. The new results were produced using 28 SMRT® Cells with 4-hour movies and P5-C3 chemistry. Sizing was performed with the SageELF™ platform (fractions collected: 1-2 kb, 2-3 kb, 3-5 kb, and 5-10 kb). Sequencing of the larger fractions with our newer sequencing chemistry that generates longer reads added longer transcripts (up to 10 kb) to the MCF-7 dataset, which previously had only transcripts up to 4 kb. New FASTA and GFF files are available, representing the new combined dataset. Raw…
In higher eukaryotic organisms, like humans, RNA transcripts from the vast majority of genes are alternatively spliced. Alternative splicing dramatically increases the protein-coding potential of eukaryotic genomes and its regulation is often specific to a given tissue or developmental stage. Using our updated Iso-Seq™ sample preparation protocol, we have generated a dataset containing the full-length whole transcriptome from three diverse human tissues (brain, heart, and liver). The updated version of the Iso-Seq method incorporates the use of a new PCR polymerase that improves the representation of larger transcripts, enabling sequencing of cDNAs of nearly 10 kb in length. The inclusion…
We are pleased to announce the launch of our new reagent kit, P6-C4, which represents the next generation of our polymerase as well as our chemistry. This kit replaces the P5-C3 chemistry and is recommended for all SMRT® Sequencing applications, including de novo assembly, targeted sequencing, isoform sequencing, minor variant detection, scaffolding, long-repeat spanning, SNP phasing, and structural variant analysis. P6-C4 continues the steady read length improvement our users have seen since the instrument first launched. With this new chemistry, average read lengths increase to 10 kb – 15 kb, with half of all data in reads 14 kb or…
AGBT 2014 is off to a roaring start – the opening reception was hastily moved indoors when an impressive thunderstorm joined the party. Wednesday’s kickoff plenary session offered an insightful view of the recently released human genome reference, known as GRCh38, which is available with GenBank accession GCA_000001405.15. Valerie Schneider from the National Center for Biotechnology Information gave a presentation on the latest build, highlighting improvements that range from alternate loci to modeled centromeres to error correction of individual bases. The Genome Reference Consortium resolved more than 1,000 reported issues from build 37 with the release of this new build…
We are pleased to make publicly available a new shotgun sequence dataset of long PacBio® reads from a human DNA sample. We previously released sequence data using Single Molecule, Real-Time (SMRT®) Sequencing of ~10x coverage of this sample, sufficient for reference-based detection of structural variation. Today we expand on that release with additional data that increases the total sequencing coverage to ~54x. This long-read data has enabled the generation of the first de novo human genome assembly from PacBio-only sequence reads. Download the 54x long-read coverage dataset. The dataset was generated from sequencing a well-studied human cell line (CHM1htert), which…
By Jonas Korlach, Chief Scientific Officer 2013 was an eventful and exciting year for PacBio. As I described in the 2013 roadmap post a year ago, we have applied numerous improvements to SMRT® Sequencing, resulting in longer read lengths, greater sequencing throughput, new and improved data-analysis methods, and more efficient workflows. We are very pleased that these advances resulted in so many publications, conference presentations, and social media contributions, with the number of peer-reviewed scientific publications from the scientific community now exceeding 100. On behalf of all of us at Pacific Biosciences, I would like to express my heartfelt gratitude…
Model organisms such as yeast, Arabidopsis and Drosophila have been essential to progress in genetic and biomedical research for more than 100 years. Model organisms are the best, fastest, most effective way to advance science especially when human experimentation may not be feasible. Numerous biological principles have been elucidated using model organisms, including Nobel-prize winning discoveries by Thomas Hunt Morgan that genes are carried on chromosomes; by Hermann Muller for the discovery that X-ray irradiation causes mutations; and by Edward B. Lewis, Christiane Nüsslein-Volhard, and Eric Wieschaus for their discoveries revealing the genetic control of early embryonic development – all…
In order to help evaluate the utility of long, unbiased sequence reads for characterizing structural variation in the human genome using our recently released P5-C3 scaffolding sequencing chemistry, we have collected 10x long-read, shotgun coverage of a human genome sample. The human genome harbors many structural variations, including variable number tandem repeats, deletions, insertions, inversions, and repetitive mobile elements, which are often difficult to resolve using short-read technologies. We hope this data set will be of value to the bioinformatic and scientific community studying various forms of structural variation across the human genome. To access the full data set, simply…