Webinar: Detecting structural variants in PacBio reads – tools and applications
Most of the basepairs that differ between two human genomes are in intermediate-sized structural variants (50 bp to 5 kb), which are too small to detect with array CGH but too large to reliably discover with short-read NGS. PacBio Single Molecule, Real-Time (SMRT) Sequencing fills this technology gap. SMRT Sequencing detects tens of thousands of structural variants in a human genome, approximately five times the sensitivity of short-read NGS. To discover variants using SMRT Sequencing, we have developed pbsv, which is available in version 5 of the PacBio SMRT Link software suite. The pbsv algorithm applies a sequence of stages: 1) identify reads with signatures of structural variation, 2) cluster nearby reads with similar signatures, 3) summarize each cluster into a consensus variant, and 4) filter for variants with sufficient read support. The pbsv algorithm is designed for individuals, trios, and population cohorts. For visualization, we have extended the popular genome browser IGV to better support structural variants and PacBio long reads. The improvements are available in IGV 2.4. To evaluate pbsv, we generated high coverage of a diploid human genome and then titrated to lower coverage levels. The false discovery rate for pbsv is low at all coverage levels. Sensitivity is high even at modest coverage, above 85% at 10-fold and 95% at 20-fold. We also applied pbsv to identify structural variants in an individual with Carney complex for whom short-read whole genome sequencing was non-diagnostic. Filtering for rare, genic structural variants left six candidates, one of which was determined to be likely causative. These applications demonstrate the ability of pbsv to detect structural variants in low-coverage PacBio sequencing and suggest the importance of considering structural variants in any study of human genetic variation.