Precision Medicine Review Highlights Need for Accuracy & Comprehensiveness in Genome Sequencing
Thursday, August 25, 2016
Stanford’s Euan Ashley wrote a terrific review about the clinical use of genome sequencing for Nature Reviews Genetics. “Towards Precision Medicine” is well worth a read, covering topics from the ethnic background of the human reference genome to public interest in precision medicine. He also covers technical angles such as mapping of sequence reads for variant calling across challenging regions of the genome with known clinical significance.
Ashley’s premise is that many of the current standards in genomics — from sequencers to analysis tools and more — were developed for use in basic research, where the consequences of inaccurate information are less severe than they would be in a clinical setting. Throughout the review, he considers what challenges need to be overcome “to bring genomics up to clinical grade.”
What caught our attention was Ashley’s excellent description of the genomic elements that make the human genome so difficult to interpret accurately: repetitive sequence, structural variants, segmental duplications, and so on. “Much of this genomic complexity is only challenging because of the prevailing technology used to assess it: short-read sequencing,” he writes. “With extensive paralogy, originating in gene families, segmental duplication or pseudogenes, the genomic location of many short reads cannot be determined with confidence.” Repeat expansion disorders, such as Huntington disease, are marked by a long series of simple repeats that are much longer than a short read, making it all but impossible to reconstruct these regions accurately with short-read sequencers.
In another example, he cites regions like the famously polymorphic major histocompatibility complex (MHC) as stumbling blocks for short-read sequencers. “The MHC is challenging to resolve using only short-read approaches because of the lack of a comprehensive catalogue of haplotypes and the intrinsic lack of phase information — that is, knowledge of the parental chromosome of origin — in short reads,” he notes, adding that phasing data is important for a variety of clinical applications, including phasing of the HLA genes housed in this region, which are associated with more than 100 diseases and many drug reactions.
Ashley sees long-read sequencing as a potential solution to many of these problems. “Long-read sequencing facilitates de novo assembly that automatically provides phase information,” he writes. “Such sequencing provides a more complete picture of the genome.” Long reads can easily span structural variants and even long stretches of repeats, making it possible to fully reconstruct these clinically relevant regions. Ashley notes that these larger structural variants have much lower variant calling accuracy with short-read sequencing methods due to their size and issues related to mapping ambiguity. He also points out that “variants that are more disruptive of the open reading frame, such as structural variants (SVs), are generally more likely to cause disease,” and highlights over 25 clinical disorders that are caused by pathogenic structural variants as an example.
Ashely ends by providing a path forward for improved accuracy in clinical genomics through “Reducing reliance on reference sequences, making phasing routine, improving calling of indels and structural variants, characterizing complex areas of the genome through long-read sequencing and maximizing the cost effectiveness of genomic coverage.” He also reminds us of how far we’ve come, and what the future holds when we get there, “Fueled by technological advancement, fundamental discovery of genetic elements related to health and disease has been the engine of human genetics for decades,” Ashley concludes. “Building on this foundation, precision medicine will use the knowledge gained to redefine disease, to realize new therapies and to provide hope for generations of patients to come.”