Single Molecule, Real-Time (SMRT) Sequencing uses the natural process of DNA replication to sequence long fragments of native DNA. As such, starting with high-quality, high molecular weight (HMW) genomic DNA (gDNA) will result in better sequencing performance across difficult to sequence regions of the genome. To obtain the highest quality, long DNA it is important to start with sample types compatible with HMW DNA extraction methods. This technical note is intended to give general guidance on sample collection, preparation, and storage across a range of commonly encountered sample types used for SMRT Sequencing whole genome projects. It is important to…
In this Webinar, we will give an introduction to Pacific Biosciences’ single molecule, real-time (SMRT) sequencing. After showing how the system works, we will discuss the main features of the technology with an emphasis on the difference between systematic error and random error and how SMRT sequencing produces better consensus accuracy than other systems. Following this, we will discuss several ground-breaking discoveries in medical science that were made possible by the longs reads and high accuracy of SMRT Sequencing.
In this webinar, Emily Hatas of PacBio shares information about the applications and benefits of SMRT Sequencing in plant and animal biology, agriculture, and industrial research fields. This session contains an overview of several applications: whole-genome sequencing for de novo assembly; transcript isoform sequencing (Iso-Seq) method for genome annotation; targeted sequencing solutions; and metagenomics and microbial interactions. High-level workflows and best practices are discussed for key applications.
In this webinar, Jonas Korlach, Chief Scientific Officer, PacBio provides an overview of the features and the advantages of the new Sequel II System. Kiran Garimella, Senior Computational Scientist, Broad Institute of MIT and Harvard University, describes his work sequencing humans with HiFi reads enabling discovery of structural variants undetectable in short reads. Luke Tallon, Scientific Director, Genomics Resource Center, Institute for Genome Sciences, University of Maryland School of Medicine, covers the GRC’s work on bacterial multiplexing, 16S microbiome profiling, and shotgun metagenomics. Finally, Shane McCarthy, Senior Research Associate, University of Cambridge, focuses on the scaling and affordability of high-quality…
To start Day 1 of the PacBio User Group Meeting, Jonas Korlach, PacBio CSO, provides an update on the latest releases and performance metrics for the Sequel II System. The longest reads generated on this system with the SMRT Cell 8M now go beyond 175,000 bases, while maintaining extremely high accuracy. HiFi mode, for example, uses circular consensus sequencing to achieve accuracy of Q40 or even Q50.
In this SMRT Leiden 2020 Online Virtual Event presentation Pedro Oliveira of Mount Sinai shares his research on Clostridioides – a leading cause of nosocomial-acquired diarrhea and colitis across the developed world. In this study, Oliveira and coworkers performed the first comprehensive DNA methylome analysis of 36 human C. difficile isolates from a hospital setting using SMRT Sequencing and comparative epigenomics.
In this SMRT Leiden 2020 Online Virtual Event presentation, Erich Jarvis of Rockefeller University shares an update on the Vertebrate Genome Project and a few exciting developments related to using the new Platinum-quality genomes to study functional evolutionary traits.
Haplotype-resolved genomes are important for understanding how combinations of variants impact phenotypes. The study of disease, quantitative traits, forensics, and organ donor matching are aided by phased genomes. Phase is commonly resolved using familial data, population-based imputation, or by isolating and sequencing single haplotypes using fosmids, BACs, or haploid tissues. Because these methods can be prohibitively expensive, or samples may not be available, alternative approaches are required. de novo genome assembly with PacBio Single Molecule, Real-Time (SMRT) data produces highly contiguous, accurate assemblies. For non-inbred samples, including humans, the separate resolution of haplotypes results in higher base accuracy and more…
De novo assemblies of human genomes from accurate (85-90%), continuous long reads (CLR) now approach the human reference genome in contiguity, but the assembly base pair accuracy is typically below QV40 (99.99%), an order-of-magnitude lower than the standard for finished references. The base pair errors complicate downstream interpretation, particularly false positive indels that lead to false gene loss through frameshifts. PacBio HiFi sequence data, which are both long (>10 kb) and very accurate (>99.9%) at the individual sequence read level, enable a new paradigm in human genome assembly. Haploid human assemblies using HiFi data achieve similar contiguity to those using…
The kakapo (Strigops habroptila) is a large, flightless parrot endemic to New Zealand. It is highly endangered with only ~150 individuals remaining, and intensive conservation efforts are underway to save this iconic species from extinction. These include genetic studies to understand critical genes relevant to fertility, adaptation and disease resistance, and genetic diversity across the remaining population for future breeding program decisions. To aid with these efforts, we have generated a high-quality de novo genome assembly using PacBio long-read sequencing. Using the new diploid-aware FALCON-Unzip assembler, the resulting genome of 1.06 Gb has a contig N50 of 5.6 Mb (largest…
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and…
Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions…
After nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist. The remaining gaps include ribosomal rDNA arrays, large near-identical segmental duplications, and satellite DNA arrays. These regions harbor largely unexplored variation of unknown consequence, and their absence from the current reference genome can lead to experimental artifacts and hide true variants when re-sequencing additional human genomes. Here we present a de novo human genome assembly that surpasses the continuity of GRCh38, along…
CRISPR-Cas9 and BEs system are poised to become the gene editing tool of choice in clinical contexts, however large fragment deletion was found in Cas9-mediated mutation cells without animal level validation. By analyzing 16 gene-edited rabbit lines (including 112 rabbits) generated using SpCas9, BEs, xCas9 and xCas9-BEs with long-range PCR genotyping and long-read sequencing by PacBio platform, we show that extending thousands of bases fragment deletions in single-guide RNA/Cas9 and xCas9 system mutation rabbit, but few large deletions were found in BEs-induced mutation rabbits. We firstly validated that no large fragment deletion induced by BEs system at animal level, suggesting…
Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.List of abbreviationsTETransposable ElementsLTRLong Terminal…