论文标题
Shap的统计方面:模型解释的功能方差分析
Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation
论文作者
论文摘要
Shap是一种衡量机器学习模型中可变重要性的流行方法。在本文中,我们研究了用于估计Shap评分的算法并概述了其与功能方差分析分解的联系。我们使用此连接表明,在Shap近似中的挑战在很大程度上与功能分布的选择以及估计的$ 2^p $ ANOVA条款的数量有关。我们认为,在这种情况下,机器学习解释性和灵敏度分析之间的联系是在启发的,但是直接的实际后果并不明显,因为这两个领域面临着不同的约束。机器学习的解释性问题模型可以评估,但通常具有数百个(即使不是数千个)功能。灵敏度分析通常处理物理或工程的模型,这些模型可能非常耗时,但在相对较小的输入空间上运行。
SHAP is a popular method for measuring variable importance in machine learning models. In this paper, we study the algorithm used to estimate SHAP scores and outline its connection to the functional ANOVA decomposition. We use this connection to show that challenges in SHAP approximations largely relate to the choice of a feature distribution and the number of $2^p$ ANOVA terms estimated. We argue that the connection between machine learning explainability and sensitivity analysis is illuminating in this case, but the immediate practical consequences are not obvious since the two fields face a different set of constraints. Machine learning explainability concerns models which are inexpensive to evaluate but often have hundreds, if not thousands, of features. Sensitivity analysis typically deals with models from physics or engineering which may be very time consuming to run, but operate on a comparatively small space of inputs.