原始还是翻译？翻译人员对机器翻译性能的影响的因果分析

论文标题

原始还是翻译？翻译人员对机器翻译性能的影响的因果分析

Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

论文作者

Ni, Jingwei, Jin, Zhijing, Freitag, Markus, Sachan, Mrinmaya, Schölkopf, Bernhard

论文摘要

人类翻译的文本以同一语言显示出与自然书面文本的不同特征。这种现象被称为翻译，被认为是将机器翻译（MT）评估混淆。但是，我们发现现有的翻译工作忽略了一些重要因素，结论主要是相关的，但不是因果关系。在这项工作中，我们收集了Causalmt，这是一个数据集，其中MT培训数据还标有人类翻译方向。我们检查了两个关键因素，即火车测试方向匹配（是否对齐训练和测试集中的人翻译方向）和数据模型方向匹配（该模型是否沿与数据集中的人类翻译方向相同的方向学习）。我们表明，这两个因素对MT性能具有很大的因果影响，除了测试模型方向不匹配，这是由于现有工作对翻译影响的影响而强调的。鉴于我们的发现，我们为MT培训和评估提供了一系列建议。我们的代码和数据位于https://github.com/edisonni-hku/causalmt

Human-translated text displays distinct features from naturally written text in the same language. This phenomena, known as translationese, has been argued to confound the machine translation (MT) evaluation. Yet, we find that existing work on translationese neglects some important factors and the conclusions are mostly correlational but not causal. In this work, we collect CausalMT, a dataset where the MT training data are also labeled with the human translation directions. We inspect two critical factors, the train-test direction match (whether the human translation directions in the training and test sets are aligned), and data-model direction match (whether the model learns in the same direction as the human translation direction in the dataset). We show that these two factors have a large causal effect on the MT performance, in addition to the test-model direction mismatch highlighted by existing work on the impact of translationese. In light of our findings, we provide a set of suggestions for MT training and evaluation. Our code and data are at https://github.com/EdisonNi-hku/CausalMT

下载PDF全文

下载文献需遵守相关版权规定

论文标题