使用多目标进化算法生成一致的多标签分类规则

论文标题

使用多目标进化算法生成一致的多标签分类规则

Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm

论文作者

Miranda, Thiago Zafalon, Sardinha, Diorge Brognara, Basgalupp, Márcio Porto, Jin, Yaochu, Cerri, Ricardo

论文摘要

多标签分类包括同时将实例分类为两个或多个类。这是许多现实世界应用中存在的非常具有挑战性的任务，例如生物学，图像，视频，音频和文本的分类。最近，对可解释分类模型的兴趣已经增长，部分原因是诸如一般数据保护法规之类的法规。在这种情况下，我们提出了一种多目标进化算法，该算法生成多个基于规则的多标签分类模型，从而使用户可以在提供预测能力和解释性之间提供不同折衷的模型中进行选择。这项工作的一个重要贡献是，与大多数算法不同，大多数算法通常会根据规则的列表（有序集合）生成模型，我们的算法基于规则的集合（无序集合）生成模型，从而提高了解释性。同样，通过在制定规则创建期间使用避免冲突的算法，保证给定模型中的每个规则都可以与同一模型中的所有其他规则一致。因此，不需要解决冲突的策略，可以发展出更简单的模型。我们对合成和现实世界数据集进行了实验，并在预测性能（F-SCORE）和可解释性（模型大小）方面将我们的结果与最先进的算法进行了比较，并证明我们的最佳模型具有可比的F评分和较小的模型尺寸。

Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text. Recently, the interest in interpretable classification models has grown, partially as a consequence of regulations such as the General Data Protection Regulation. In this context, we propose a multi-objective evolutionary algorithm that generates multiple rule-based multi-label classification models, allowing users to choose among models that offer different compromises between predictive power and interpretability. An important contribution of this work is that different from most algorithms, which usually generate models based on lists (ordered collections) of rules, our algorithm generates models based on sets (unordered collections) of rules, increasing interpretability. Also, by employing a conflict avoidance algorithm during the rule-creation, every rule within a given model is guaranteed to be consistent with every other rule in the same model. Thus, no conflict resolution strategy is required, evolving simpler models. We conducted experiments on synthetic and real-world datasets and compared our results with state-of-the-art algorithms in terms of predictive performance (F-Score) and interpretability (model size), and demonstrate that our best models had comparable F-Score and smaller model sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题