通过数据扩展进行半监督模型，用于分类互动情感响应

论文标题

通过数据扩展进行半监督模型，用于分类互动情感响应

Semi-Supervised Models via Data Augmentationfor Classifying Interactive Affective Responses

论文作者

Chen, Jiaao, Wu, Yuwei, Yang, Diyi

论文摘要

我们提出了具有数据增强（SMDA）的半监督模型，该模型是一种半监督的文本分类系统，旨在对交互式情感响应进行分类。 SMDA利用最新的基于变压器的模型编码每个句子，并将翻译技术采用给定句子作为增强数据的释义。对于标记的句子，我们进行了数据增强，以统一标签分布和在培训过程中计算的监督损失。对于未标记的句子，我们探索了自我训练，即对未标记的句子作为伪标签的低渗透预测，假设对培训的标记数据具有很高的信心预测。我们进一步引入了一致性正则化作为无标记数据的数据增强后的无监督损失，基于以下假设：该模型应以原始未标记的句子为输入和增强句子作为输入，以预测具有原始未标记句子的类似类别分布。通过一组实验，我们证明了我们的系统在F1得分和准确性方面优于基线模型。

We present semi-supervised models with data augmentation (SMDA), a semi-supervised text classification system to classify interactive affective responses. SMDA utilizes recent transformer-based models to encode each sentence and employs back translation techniques to paraphrase given sentences as augmented data. For labeled sentences, we performed data augmentations to uniform the label distributions and computed supervised loss during training process. For unlabeled sentences, we explored self-training by regarding low-entropy predictions over unlabeled sentences as pseudo labels, assuming high-confidence predictions as labeled data for training. We further introduced consistency regularization as unsupervised loss after data augmentations on unlabeled data, based on the assumption that the model should predict similar class distributions with original unlabeled sentences as input and augmented sentences as input. Via a set of experiments, we demonstrated that our system outperformed baseline models in terms of F1-score and accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题