论文标题
校准符合解释:模型置信度估计的一种简单有效的方法
Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates
论文作者
论文摘要
校准通过在给定示例中产生更好的准确置信度估计来增强黑盒模型的可信赖性。但是,对于模型解释是否可以帮助置信度校准知之甚少。直观地,人类研究了重要的特征归因,并决定该模型是否值得信赖。同样,解释可以告诉我们模型何时可能或不知道。受此启发,我们提出了一种名为CME的方法,该方法利用模型解释使模型对非电感归因的自信程度降低。这个想法是,当模型没有高度自信时,很难确定任何类别的有力迹象,因此代币对任何类别的阶层都没有很高的归因分数,反之亦然。我们在六个数据集上进行了广泛的实验,并在域内和室外设置中具有两个流行的预训练的语言模型。结果表明,CME在所有设置中都提高了校准性能。与温度缩放结合使用时,预期的校准误差将进一步减少。我们的发现强调了模型解释可以帮助校准后验估计值。
Calibration strengthens the trustworthiness of black-box models by producing better accurate confidence estimates on given examples. However, little is known about if model explanations can help confidence calibration. Intuitively, humans look at important features attributions and decide whether the model is trustworthy. Similarly, the explanations can tell us when the model may or may not know. Inspired by this, we propose a method named CME that leverages model explanations to make the model less confident with non-inductive attributions. The idea is that when the model is not highly confident, it is difficult to identify strong indications of any class, and the tokens accordingly do not have high attribution scores for any class and vice versa. We conduct extensive experiments on six datasets with two popular pre-trained language models in the in-domain and out-of-domain settings. The results show that CME improves calibration performance in all settings. The expected calibration errors are further reduced when combined with temperature scaling. Our findings highlight that model explanations can help calibrate posterior estimates.