Long-read sequencing has substantial advantages for structural variant discovery and phasing of vari- ants compared to short-read technologies, but the required and optimal read length has not been as- sessed. In this work, we used long reads simulated from human genomes and evaluated structural vari- ant discovery and variant phasing using current best practicebioinformaticsmethods.Wedeterminedthatoptimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequenc- ing projects.
Journal: NAR Genomics and Bioinformatics