与人群的机器翻译适应域

论文标题

与人群的机器翻译适应域

Domain Adaptation of Machine Translation with Crowdworkers

论文作者

Morishita, Makoto, Suzuki, Jun, Nagata, Masaaki

论文摘要

尽管经过大型内域并行语料库训练的机器翻译模型取得了显着的结果，但是当没有可用内域数据时，它仍然可以效果不佳。当目标域的数据受到限制时，这种情况限制了机器翻译的适用性。但是，对许多域的高质量域特异性机器翻译模型的需求很大。我们提出了一个框架，该框架在人群工作者的帮助下从网络中有效地收集目标域中的并行句子。借助收集的并行数据，我们可以快速将机器翻译模型调整到目标域。我们的实验表明，所提出的方法可以在几天内以合理的成本收集目标域并行数据。我们用五个域对其进行了测试，与通用翻译模型相比，域适应的模型平均将BLEU得分提高到+19.7。

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose translation model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题