探索缓解捷径行为的类型学

论文标题

探索缓解捷径行为的类型学

A Typology for Exploring the Mitigation of Shortcut Behavior

论文作者

Friedrich, Felix, Stammer, Wolfgang, Schramowski, Patrick, Kersting, Kristian

论文摘要

随着机器学习模型变得越来越大，受过训练的大型，可能未经保育的数据集受过较弱的监督，建立检查，互动和修订模型的机制变得越来越重要，以减轻学习快捷方式并保证他们的学习知识与人类知识保持一致。为此目的开发了最近提出的XIL框架，并引入了几种此类方法，每种方法都有个人动机和方法论细节。在这项工作中，我们通过建立一组共同的基本模块来将各种XIL方法统一为单一类型。通过这样做，我们为现有的原则比较铺平了道路，但重要的是，也是未来的XIL方法。此外，我们讨论了现有的并引入新的措施和基准，以评估XIL方法的整体能力。考虑到这个广泛的工具箱，包括我们的类型学，测量和基准，我们最终在方法论和定量上比较了最近的几种XIL方法。在我们的评估中，所有方法都证明可以成功修改模型。但是，我们发现各个基准任务的显着差异，揭示了与应用程序相关的有价值的方面，以将这些基准集成在开发未来方法中。

As machine learning models become increasingly larger, trained weakly supervised on large, possibly uncurated data sets, it becomes increasingly important to establish mechanisms for inspecting, interacting, and revising models to mitigate learning shortcuts and guarantee their learned knowledge is aligned with human knowledge. The recently proposed XIL framework was developed for this purpose, and several such methods have been introduced, each with individual motivations and methodological details. In this work, we provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In doing so, we pave the way for a principled comparison of existing, but, importantly, also future XIL approaches. In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method. Given this extensive toolbox, including our typology, measures, and benchmarks, we finally compare several recent XIL methods methodologically and quantitatively. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects for integrating these benchmarks in developing future methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题