尤里卡（Eureka）：通过基于KNN的方法和增强增强了委婉的识别

论文标题

尤里卡（Eureka）：通过基于KNN的方法和增强增强了委婉的识别

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

论文作者

Keh, Sedrick Scott, Bharadwaj, Rohit K., Liu, Emmy, Tedeschi, Simone, Gangal, Varun, Navigli, Roberto

论文摘要

我们介绍了Eureka，这是一种基于合奏的方法，用于执行自动委婉语的检测。（1）在数据集中识别并纠正并纠正潜在贴错的行，（2）策划一个称为euphaug的扩展的语料库，（3）潜在的委屈术语（PET）的杠杆模型表示，以及（4）使用语义上的近距离句子来探索分类的探索。使用我们的增强数据集和基于KNN的方法，Eureka能够在委婉检测共享任务的公共排行榜上取得最新的结果，并以0.881的宏F1分数排名第一。我们的代码可在https://github.com/sedrickkeh/eureka上找到。

We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881. Our code is available at https://github.com/sedrickkeh/EUREKA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题