利用未标记的数据用于目标意见单词提取

论文标题

利用未标记的数据用于目标意见单词提取

Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction

论文作者

Wang, Yidong, Wu, Hao, Liu, Ao, Hou, Wenxin, Wu, Zhen, Wang, Jindong, Shinozaki, Takahiro, Okumura, Manabu, Zhang, Yue

论文摘要

面向目标的意见单词提取（TOWE）是一项精细的情感分析任务，旨在从句子中提取给定意见目标的相应意见单词。最近，深度学习方法在这项任务上取得了显着进步。然而，由于昂贵的数据注释过程，TOWE任务仍然遭受培训数据的稀缺性。有限的标记数据增加了测试数据和培训数据之间分配变化的风险。在本文中，我们建议利用大量未标记的数据来通过增加模型对变化分布变化的暴露来降低风险。具体而言，我们提出了一种新型的多透明一致性正则化（MGCR）方法，以利用未标记的数据并设计两个专门用于TOWE的过滤器，以在不同的粒度上过滤嘈杂的数据。四个TOWE基准数据集的广泛实验结果表明，与当前的最新方法相比，MGCR的优越性。深入分析还证明了不同粒度过滤器的有效性。我们的代码可在https://github.com/towessl/towessl上找到。

Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentiment analysis task that aims to extract the corresponding opinion words of a given opinion target from the sentence. Recently, deep learning approaches have made remarkable progress on this task. Nevertheless, the TOWE task still suffers from the scarcity of training data due to the expensive data annotation process. Limited labeled data increase the risk of distribution shift between test data and training data. In this paper, we propose exploiting massive unlabeled data to reduce the risk by increasing the exposure of the model to varying distribution shifts. Specifically, we propose a novel Multi-Grained Consistency Regularization (MGCR) method to make use of unlabeled data and design two filters specifically for TOWE to filter noisy data at different granularity. Extensive experimental results on four TOWE benchmark datasets indicate the superiority of MGCR compared with current state-of-the-art methods. The in-depth analysis also demonstrates the effectiveness of the different-granularity filters. Our codes are available at https://github.com/TOWESSL/TOWESSL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题