DATSCORE：用数据增强翻译评估翻译

论文标题

DATSCORE：用数据增强翻译评估翻译

DATScore: Evaluating Translation with Data Augmented Translations

论文作者

Eddine, Moussa Kamal, Shang, Guokan, Vazirgiannis, Michalis

论文摘要

大型语言模型的快速发展不仅彻底改变了自然语言产生（NLG），而且彻底改变了其评估。受Bartscore的最新工作的启发：一个指标利用BART语言模型评估来自各个方面的文本质量的启发，我们介绍了DatScore。 DatScore使用数据扩展技术来改善机器翻译的评估。我们的主要发现是，引入源和参考文本的数据增强翻译非常有助于评估生成的翻译的质量。我们还提出了两种新颖的得分平均和学期加权策略，以改善BartScore的原始得分计算过程。 WMT上的实验结果表明，DATSCORE与人类元评估的相关性比其他最新的最新指标更好，尤其是对于低资源语言。消融研究证明了我们新的评分策略所添加的价值。此外，我们在扩展实验中报告了DatScore在翻译以外的3个NLG任务上的性能。

The rapid development of large pretrained language models has revolutionized not only the field of Natural Language Generation (NLG) but also its evaluation. Inspired by the recent work of BARTScore: a metric leveraging the BART language model to evaluate the quality of generated text from various aspects, we introduce DATScore. DATScore uses data augmentation techniques to improve the evaluation of machine translation. Our main finding is that introducing data augmented translations of the source and reference texts is greatly helpful in evaluating the quality of the generated translation. We also propose two novel score averaging and term weighting strategies to improve the original score computing process of BARTScore. Experimental results on WMT show that DATScore correlates better with human meta-evaluations than the other recent state-of-the-art metrics, especially for low-resource languages. Ablation studies demonstrate the value added by our new scoring strategies. Moreover, we report in our extended experiments the performance of DATScore on 3 NLG tasks other than translation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题