零摄像仇恨言语检测的假设工程

论文标题

零摄像仇恨言语检测的假设工程

Hypothesis Engineering for Zero-Shot Hate Speech Detection

论文作者

Goldzycher, Janis, Schneider, Gerold

论文摘要

仇恨言论检测的标准方法取决于足够的可用仇恨言语注释。扩展了对零弹性文本分类的自然语言推理（NLI）模型的先前工作，我们提出了一种简单的方法，该方法结合了多种假设，以改善基于英语NLI的零击仇恨语音检测。我们首先对基于Vanilla NLI的零击仇恨语音检测进行错误分析，然后基于此分析制定四个策略。这些策略使用多个假设来预测输入文本的各个方面，并将这些预测结合在一起，成为最终的判决。我们发现，用于初始错误分析的零射基线已经超过商业系统和基于BERT BERT的仇恨语音检测模型。提议的策略的组合进一步提高了Hatecheck的零射击准确性79.4％，提高了79.9％的百分点（PP），到10.0pp的精神量为69.6％。

Standard approaches to hate speech detection rely on sufficient available hate speech annotations. Extending previous work that repurposes natural language inference (NLI) models for zero-shot text classification, we propose a simple approach that combines multiple hypotheses to improve English NLI-based zero-shot hate speech detection. We first conduct an error analysis for vanilla NLI-based zero-shot hate speech detection and then develop four strategies based on this analysis. The strategies use multiple hypotheses to predict various aspects of an input text and combine these predictions into a final verdict. We find that the zero-shot baseline used for the initial error analysis already outperforms commercial systems and fine-tuned BERT-based hate speech detection models on HateCheck. The combination of the proposed strategies further increases the zero-shot accuracy of 79.4% on HateCheck by 7.9 percentage points (pp), and the accuracy of 69.6% on ETHOS by 10.0pp.

下载PDF全文

下载文献需遵守相关版权规定

论文标题