Stanford University Archives

June 1, 2021

Genomic Architecture of the KIR and MHC-B and -C Regions in Orangutan

PacBio 2013 User Group Meeting Presentation Slides: Lisbeth Guethlein from Stanford University School of Medicine looked at highly repetitive and variable immune regions of the orangutan genome. Guethlein reported that “PacBio managed to accomplish in a week what I have been working on for a couple years” (with Sanger sequencing), and the results were concordant. “Long story short, I was a happy customer.”

June 1, 2021

CONVEX: de novo transcriptome error correction by convexification

2015 SMRT Informatics Developers Conference Presentation Slides: David Tse of Stanford University presented on a method his team is developing for de novo transcriptome error correction by convexification.

June 1, 2021

Understanding methylome, metagenome, structural variants using SMRT Sequencing

2015 SMRT Informatics Developers Conference Presentation Slides: Shinichi Morishita of the University of Tokyo presented on how his team has been using SMRT Sequencing to better understand methylomes, metagenomes and structural variation of various eukaryotic genomes.

June 1, 2021

Reference materials for clinical applications of human genome sequencing

The Genome in a Bottle Consortium is developing the reference materials, reference methods , and reference data n

June 1, 2021

Genome in a Bottle: You’ve sequenced. How well did you do?

Purpose: Clinical laboratories, research laboratories and technology developers all need DNA samples with reliably known genotypes in order to help validate and improve their methods. The Genome in a Bottle Consortium (genomeinabottle.org) has been developing Reference Materials with high-accuracy whole genome sequences to support these efforts.Methodology: Our pilot reference material is based on Coriell sample NA12878 and was released in May 2015 as NIST RM 8398 (tinyurl.com/giabpilot). To minimize bias and improve accuracy, 11 whole-genome and 3 exome data sets produced using 5 different technologies were integrated using a systematic arbitration method [1]. The Genome in a Bottle Analysis Group is adapting these methods and developing new methods to characterize 2 families, one Asian and one Ashkenazi Jewish from the Personal Genome Project, which are consented for public release of sequencing and phenotype data. We have generated a larger and even more diverse data set on these samples, including high-depth Illumina paired-end and mate-pair, Complete Genomics, and Ion Torrent short-read data, as well as Moleculo, 10X, Oxford Nanopore, PacBio, and BioNano Genomics long-read data. We are analyzing these data to provide an accurate assessment of not just small variants but also large structural variants (SVs) in both “easy” regions of the genome and in some “hard” repetitive regions. We have also made all of the input data sources publicly available for download, analysis, and publication.Results: Our arbitration method produced a reference data set of 2,787,291 single nucleotide variants (SNVs), 365,135 indels, 2744 SVs, and 2.2 billion homozygous reference calls for our pilot genome. We found that our call set is highly sensitive and specific in comparison to independent reference data sets. We have also generated preliminary assemblies and structural variant calls for the next 2 trios from long read data and are currently integrating and validating these.Discussion: We combined the strengths of each of our input datasets to develop a comprehensive and accurate benchmark call set. In the short time it has been available, over 20 published or submitted papers have used our data. Many challenges exist in comparing to our benchmark calls, and thus we have worked with the Global Alliance for Genomics and Health to develop standardized methods, performance metrics, and software to assist in its use.[1] Zook et al, Nat Biotech. 2014.

June 1, 2021

An improved circular consensus algorithm with an application to detection of HIV-1 Drug-Resistance Associated Mutations (DRAMs)

Scientists who require confident resolution of heterogeneous populations across complex regions have been unable to transition to short-read sequencing methods. They continue to depend on Sanger Sequencing despite its cost and time inefficiencies. Here we present a new redesigned algorithm that allows the generation of circular consensus sequences (CCS) from individual SMRT Sequencing reads. With this new algorithm, dubbed CCS2, it is possible to reach arbitrarily high quality across longer insert lengths at a lower cost and higher throughput than Sanger Sequencing. We apply this new algorithm, dubbed CCS2, to the characterization of the HIV-1 K103N drug-resistance associated mutation, which is both important clinically, and represents a challenge due to regional sequence context. A mutation was introduced into the 3rd position of amino acid position 103 (A>C substitution) of the RT gene on a pNL4-3 backbone by site-directed mutagenesis. Regions spanning ~1,300 bp were PCR amplified from both the non-mutated and mutant (K103N) plasmids, and were sequenced individually and as a 50:50 mixture. Sequencing data were analyzed using the new CCS2 algorithm, which uses a fully-generative probabilistic model of our SMRT Sequencing process to polish consensus sequences to arbitrarily high accuracy. This result, previously demonstrated for multi-molecule consensus sequences with the Quiver algorithm, is made possible by incorporating per-Zero Mode Waveguide (ZMW) characteristics, thus accounting for the intrinsic changes in the sequencing process that are unique to each ZMW. With CCS2, we are able to achieve a per-read empirical quality of QV30 with 19X coverage. This yields ~5000 1.3 kb consensus sequences with a collective empirical quality of ~QV40. Additionally, we demonstrate a 0% miscall rate in both unmixed samples, and estimate a 48:52% frequency for the K103N mutation in the mixed sample, consistent with data produced by orthogonal platforms.

June 1, 2021

De novo assembly and preliminary annotation of the Schizocardium californicum genome

Animals in the phylum Hemichordata have provided key understanding of the origins and development of body patterning and nervous system organization. However, efforts to sequence and assemble the genomes of highly heterozygous non-model organisms have proven to be difficult with traditional short read approaches. Long repetitive DNA structures, extensive structural variation between haplotypes in polyploid species, and large genome sizes are limiting factors to achieving highly contiguous genome assemblies. Here we present the highly contiguous de novo assembly and preliminary annotation of an indirect developing hemichordate genome, Schizocardium californicum, using SMRT Sequening long reads.

February 5, 2021

Podcast: Short read sequencing not up to the task of characterizing transcriptome says Mike Snyder of Stanford

Mike Snyder from Stanford University has published recent papers in Nature Biotechnology and PNAS using SMRT Sequencing for transcriptome analysis and demonstrated that long reads enable full coverage of RNA molecules. He discusses that…

February 5, 2021

ASHG PacBio Workshop: Towards precision medicine

Euan Ashley from Stanford University started with the premise that while current efforts in the field of genomics medicine address 30% of patient cases, there’s a need for new approaches…

February 5, 2021

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

Jonas Korlach spoke about recent SMRT Sequencing updates, such as latest Sequel System chemistry release (1.2.1) and updates to the Integrative Genomics Viewer that’s now update optimized for PacBio data….

February 5, 2021

Webinar: SMRT Sequencing applications in plant and animal sciences: an overview

In this webinar, Emily Hatas of PacBio shares information about the applications and benefits of SMRT Sequencing in plant and animal biology, agriculture, and industrial research fields. This session contains…

February 5, 2021

Video: Structural variant detection with SMRT Sequencing

In this video, Aaron Wenger, a research scientist at PacBio, describes the use of long-read SMRT Sequencing to detect structural variants in the human genome. He shares that structural variations…

February 5, 2021

Webinar: Sequencing structural variants for disease gene discovery and population genetics

Structural variants (SVs, differences >50 base pairs) account for most of the base pairs that differ between two human genomes, and are known to cause over 1,000 genetic disorders including…

February 5, 2021

ASHG PacBio Workshop: Latest product and application updates

In this ASHG 2020 PacBio Workshop Jonas Korlach, CSO, shares how the new PacBio Sequel IIe System makes highly accurate long-read sequencing easy and affordable so?all scientists can gain comprehensive…

April 21, 2020

A robust benchmark for germline structural variant detection

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls =50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.

Auto Tag: Stanford University

Genomic Architecture of the KIR and MHC-B and -C Regions in Orangutan

CONVEX: de novo transcriptome error correction by convexification

Understanding methylome, metagenome, structural variants using SMRT Sequencing

Reference materials for clinical applications of human genome sequencing

Genome in a Bottle: You’ve sequenced. How well did you do?

An improved circular consensus algorithm with an application to detection of HIV-1 Drug-Resistance Associated Mutations (DRAMs)

De novo assembly and preliminary annotation of the Schizocardium californicum genome

Podcast: Short read sequencing not up to the task of characterizing transcriptome says Mike Snyder of Stanford

ASHG PacBio Workshop: Towards precision medicine

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

Webinar: SMRT Sequencing applications in plant and animal sciences: an overview

Video: Structural variant detection with SMRT Sequencing

Webinar: Sequencing structural variants for disease gene discovery and population genetics

ASHG PacBio Workshop: Latest product and application updates

A robust benchmark for germline structural variant detection

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert