Nature Webinar and SMRT Grant Winner Explore Structural Variation for Disease Gene Discovery
Tuesday, April 24, 2018
Structural variants account for most of the base pairs that differ between human genomes, and are known to cause more than 1,000 genetic disorders, including ALS, schizophrenia, and hereditary cancer. Yet they remain overlooked in human genetic research studies due to inherent challenges of short-read sequencing methods to resolve complex variants, which often involve repetitive DNA.
At a recent webinar co-hosted by Nature Research, Professor Alexander Hoischen joined Principal Scientist Aaron Wenger to discuss how advances in long-read sequencing and structural variant calling algorithms have made it possible to affordably detect the more than 20,000 such variants that are now known to exist in a human genome.
Wenger described methods for calling and visualizing structural variants from low-coverage, long-read sequencing of human genomes, and presented optimal study designs for both gene discovery and population genetics, while Hoischen shared case studies.
New Insights into Neurodevelopmental Disorders
Hoischen’s team at Radboud University Medical Center uses intellectual disability as a model for severe, sporadic neurodevelopmental disorders. Extensive research suggests that more than 60% of moderate to severe cases of intellectual disability are caused by de novo mutations, but even after the application of microarrays, exome sequencing, and short-read whole genome sequencing, 38% of patients remain undiagnosed because no causal variant can be found.
To address these unexplained cases, Hoischen and his colleagues adopted SMRT Sequencing to see if long-read technology could offer new insight. They already knew that information about structural variants associated with intellectual disability was lacking compared to single-nucleotide variants.
In a pilot project that is still underway, Hoischen selected five patient/parent trios and sequenced them to high coverage with the Sequel System. While all 15 people had previously been analyzed with microarrays, exome sequencing, and short-read WGS, SMRT Sequencing still uncovered 21 Mb of genomic sequence in each sample that was essentially new data. Of that, 7 Mb of sequence falls in genic space.
Using PBSV and a new joint calling tool the team beta tested, they found as many as 23,000 structural variants larger than 50 bp per genome. Nearly 70% of those variants were missed by short-read sequencing — 80% of insertions and 55% of deletions were novel, Hoischen said. By broadening their search to variants as small as 20 bp, the team expanded its variant calls to as many as 40,000 in each genome, with similar stats for novel findings. They have used PCR and other approaches to validate many of the calls, showing that the PacBio data is highly accurate.
For analysis, Hoischen and his team are focusing on de novo mutations. They use data from the parents to rapidly filter out inherited mutations, getting the patient’s universe of potentially causative variants down to very manageable numbers for follow-up study. In one example, he showed that just 40 structural variants were left to investigate in the patient after this filtering process. Hoischen said this approach is likely to be powerful for clinical applications.
Deeper Dive into Autism Spectrum Disorder
Another project was also announced during the presentation. Stephen Scherer, director of The Centre for Applied Genomics at The Hospital for Sick Children (SickKids) and Professor of Medicine at the University of Toronto, was named as recipient of the Structural Variant SMRT Grant Program, launched in partnership with GENEWIZ during the American Society of Human Genetics Annual Meeting in October 2017. He will receive sequencing on the Sequel System and bioinformatics support to pursue the project entitled “Using Low-Coverage PacBio SMRT Sequencing to Find Structural Variation Mutations in Autism Families with Multiple Affected Individuals.”
Scherer previously published results of a study in which he used short-read whole-genome sequencing to detect single nucleotide variations, indels, and copy number variations in more than 5,000 samples from families with ASD. Although he was able to successfully identify many variants and associated genes affecting autism risk, the study did not report on structural variation findings and for most families the genetic determinants are still to be resolved.
Congratulations to Dr. Scherer and we look forward to successfully applying low-coverage whole genome SMRT Sequencing to this important and ground-breaking research.