发现，解释，改进：自然语言处理的自动切片检测框架

论文标题

发现，解释，改进：自然语言处理的自动切片检测框架

Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing

论文作者

Hua, Wenyue, Jin, Lifeng, Song, Linfeng, Mi, Haitao, Zhang, Yongfeng, Yu, Dong

论文摘要

预处理的自然语言处理（NLP）模型已经达到了很高的总体表现，但仍会犯有系统的错误。对SLICE检测模型（SDM）的研究没有手动错误分析，该研究自动识别出表现不佳的数据点组，它引起了计算机视觉中的关注，以了解理解模型行为，并为未来的模型培训和设计提供见解。但是，对SDM的研究很少，对其有效性进行了定量评估，已经对NLP任务进行了。我们的论文通过提出一个名为“ Discover，lovel，Revort（Deim）”的基准，以填补空白，用于分类NLP任务以及新的SDM EDISA。 Edisa发现了数据界的连贯和表现不佳的组；然后，Deim将它们团结在人类理解的概念下，并提供全面的评估任务和相应的定量指标。 Deim中的评估表明，EDISA可以准确选择具有错误的语义功能的容易出错数据点，以汇总错误模式。检测困难的数据点直接提高模型性能，而无需调整任何原始模型参数，这表明发现的切片适用于用户。

Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors. Instead of manual error analysis, research on slice detection models (SDM), which automatically identify underperforming groups of datapoints, has caught escalated attention in Computer Vision for both understanding model behaviors and providing insights for future model training and designing. However, little research on SDM and quantitative evaluation of their effectiveness have been conducted on NLP tasks. Our paper fills the gap by proposing a benchmark named "Discover, Explain, Improve (DEIM)" for classification NLP tasks along with a new SDM Edisa. Edisa discovers coherent and underperforming groups of datapoints; DEIM then unites them under human-understandable concepts and provides comprehensive evaluation tasks and corresponding quantitative metrics. The evaluation in DEIM shows that Edisa can accurately select error-prone datapoints with informative semantic features that summarize error patterns. Detecting difficult datapoints directly boosts model performance without tuning any original model parameters, showing that discovered slices are actionable for users.

下载PDF全文

下载文献需遵守相关版权规定

论文标题