As the foundation for scientific discoveries in genetic diversity, sequencing data must be accurate and complete. With highly accurate long-read sequencing, or HiFi sequencing, there is no longer a compromise between read length and accuracy. HiFi sequencing enables some of the highest quality de novo genome assemblies available today as well as comprehensive variant detection in human samples. PacBio HiFi libraries constructed using our standard library workflows require at least 3 µg of DNA input per 1 Gb of genome length, or ~10 µg for a human sample. For some samples it is not possible to extract this amount of…
Yunfei Guo, from the University of Southern California, presents his ASHG 2015 poster on a de novo assembly of a diploid Asian genome. The uniform coverage of long-read sequencing helped access regions previously unresolvable due to high GC bias or long repeats. The assembly allowed scientists to fill some 400 gaps in the latest human reference genome, including some as long as 50 kb.
PacBio Sequencing is characterized by very long sequence reads (averaging > 10,000 bases), lack of GC-bias, and high consensus accuracy. These features have allowed the method to provide a new gold standard in de novo genome assemblies, producing highly contiguous (contig N50 > 1 Mb) and accurate (> QV 50) genome assemblies. We will briefly describe the technology and then highlight the full workflow, from sample preparation through sequencing to data analysis, on examples of insect genome assemblies, and illustrate the difference these high-quality genomes represent with regard to biological insights, compared to fragmented draft assemblies generated by short-read sequencing.
In this PacBio User Group Meeting presentation, PacBio scientist Meredith Ashby shared several examples of analysis — from full-length 16S sequencing to shotgun sequencing — showing how SMRT Sequencing enables accurate representation for metagenomics and microbiome characterization, in some cases even without fully assembling genomes. New updates will provide users with a dedicated microbial assembly pipeline, optimized for all classes of bacteria, as well as increased multiplexing on the Sequel II System, now with 48 validated barcoded adapters. That throughput could reduce the cost of microbial analysis substantially.
Understanding interactions among plants and the complex communities of organisms living on, in and around them requires more than one experimental approach. A new method for de novo metagenome assembly, PacBio HiFi sequencing, has unique strengths for determining the functional capacity of metagenomes. With HiFi sequencing, the accuracy and median read length of unassembled data outperforms the quality metrics for many existing assemblies generated with other technologies, enabling cost-competitive recovery of full-length genes and operons even from rare species. When paired with the ability to close the genomes of even challenging isolates like Xanthomonas, the PacBio Sequel II System is…
In this webinar you will hear how several researchers have overcome the challenges of sequencing organisms with small body size using the new low and ultra-low DNA input methods from PacBio. Learn about the advantages of using highly accurate long reads (HiFi reads) to sequence and de novo assemble genomes of single individuals.
Introduction: Long-read sequencing has been applied successfully to assemble genomes and detect structural variants. However, due to high raw-read error rates (10-15%), it has remained difficult to call small variants from long reads. Recent improvements in library preparation and sequencing chemistry have increased length, accuracy, and throughput of PacBio circular consensus sequencing (CCS) reads, resulting in 10-20kb reads with average read quality above 99%. Materials and Methods: We sequenced a 12kb library from human reference sample HG002 to 18-fold coverage on the PacBio Sequel II System with three SMRT Cells 8M. The CCS algorithm was used to generate highly-accurate (average…
Recent work comparing metagenomic sequencing methods indicates that a comprehensive picture of the taxonomic and functional diversity of complex communities will be difficult to achieve with short-read technology alone. While the lower cost of short reads has enabled greater sequencing depth, the greater contiguity of long-read assemblies and lack of GC bias in SMRT Sequencing has enabled better gene finding. However, since long-read assembly requires high coverage for error correction, the benefits of unbiased coverage have in the past been lost for low abundance species. SMRT Sequencing performance improvements and the introduction of the Sequel II System has enabled a…
Recent work comparing metagenomic sequencing methods indicates that a comprehensive picture of the taxonomic and functional diversity of complex communities will be difficult to achieve with one sequencing technology alone. While the lower cost of short reads has enabled greater sequencing depth, the greater contiguity of long-read assemblies and lack of GC bias in SMRT Sequencing has enabled better gene finding. However, since long-read assembly typically requires high coverage for error correction, these benefits have in the past been lost for low-abundance species. The introduction of the Sequel II System has enabled a new, higher throughput, assembly-optional data type that…
Introduction: Long-read sequencing has been applied successfully to assemble genomes and detect structural variants. However, due to high raw-read error rates (10-15%), it has remained difficult to call small variants from long reads. Recent improvements in library preparation and sequencing chemistry have increased length, accuracy, and throughput of PacBio circular consensus sequencing (CCS) reads, resulting in 15-20kb reads with average read quality above 99%. Materials and Methods: We sequenced a library from human reference sample HG002 to 18-fold coverage on the PacBio Sequel II with two SMRT Cells 8M. The CCS algorithm was used to generate highly accurate (average 99.9%)…
A first look at Pacific Biosciences RS data Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome these limitations by providing significantly longer reads (now averaging >1kb), enabling more unique seeds for reference alignment. In addition, the lack of amplification in the library construction step avoids a common source of base composition bias. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical resequencing projects by assessing the quality of the raw sequencing data, as well as its use for SNP discovery and genotyping…
In addition to the genome and transcriptome, epigenetic information is essential to understand biological processes and their regulation, and their misregulation underlying disease. Traditionally, epigenetic DNA modifications are detected using upfront sample preparation steps such as bisulfite conversion, followed by sequencing. Bisulfite sequencing has provided a wealth of knowledge about human epigenetics, however it does not access the entire genome due to limitations in read length and GC- bias of the sequencing technologies used. In contrast, Single Molecule, Real-Time (SMRT) DNA Sequencing is unique in that it can detect DNA base modifications as part of the sequencing process. It can…
Target enrichment capture methods allow scientists to rapidly interrogate important genomic regions of interest for variant discovery, including SNPs, gene isoforms, and structural variation. Custom targeted sequencing panels are important for characterizing heterogeneous, complex diseases and uncovering the genetic basis of inherited traits with more uniform coverage when compared to PCR-based strategies. With the increasing availability of high-quality reference genomes, customized gene panels are readily designed with high specificity to capture genomic regions of interest, thus enabling scientists to expand their research scope from a single individual to larger cohort studies or population-wide investigations. Coupled with PacBio® long-read sequencing, these…
Genes associated with several neurological disorders have been shown to be highly polymorphic. Targeted sequencing of these genes using NGS technologies is a powerful way to increase the cost-effectiveness of variant discovery and detection. However, for a comprehensive view of these target genes, it is necessary to have complete and uniform coverage across regions of interest. Unfortunately, short-read sequencing technologies are not ideal for these types of studies as they are prone to mis-mapping and often fail to span repetitive regions. Targeted sequencing with PacBio long reads provides the unique advantage of single-molecule observations of complex genomic regions. PacBio long…
Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently…