从语言模型中降低非规范文本生成

论文标题

从语言模型中降低非规范文本生成

Reducing Non-Normative Text Generation from Language Models

论文作者

Peng, Xiangyu, Li, Siyan, Frazier, Spencer, Riedl, Mark

论文摘要

大规模的，基于变压器的语言模型（例如GPT-2）是在互联网上刮除的各种语料库中预估计的。因此，它们容易产生非规范的文本（即违反社会规范）。我们使用政策梯度增强学习技术和一种规范性文本分类器来引入微调GPT-2的技术，以产生奖励和惩罚价值。我们使用自动化和人类参与者实验评估了五个数据集的技术。与金标准的人类对规范性和非规范生成的文本的判断相比，规范性文本分类器准确性为81-90％。根据数据集，我们的规范微调技术能够将非规范性文本减少27-61％。

Large-scale, transformer-based language models such as GPT-2 are pretrained on diverse corpora scraped from the internet. Consequently, they are prone to generating non-normative text (i.e. in violation of social norms). We introduce a technique for fine-tuning GPT-2, using a policy gradient reinforcement learning technique and a normative text classifier to produce reward and punishment values. We evaluate our technique on five data sets using automated and human participant experiments. The normative text classifier is 81-90% accurate when compared to gold-standard human judgments of normative and non-normative generated text. Our normative fine-tuning technique is able to reduce non-normative text by 27-61%, depending on the data set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题