学习基于层次短语翻译的多个专门非末端的无上下文语法

论文标题

学习基于层次短语翻译的多个专门非末端的无上下文语法

Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation

论文作者

Sánchez-Martínez, Felipe, Pérez-Ortiz, Juan Antonio, Carrasco, Rafael C.

论文摘要

基于基于层次短语的统计计算机翻译（HSMT）的翻译模型比某些语言对的非层次短语对应物显示出更好的性能。 HSMT的标准方法学习并应用单个非末端的无上下文语法。这项工作中提出的语法改进算法背后的假设是，这种单端的末端过载，并且不足以判别，因此，将其足够的分为更专业的符号可能会导致改进的模型。本文提出了一种学习具有大量初始非末端的无上下文语法的方法，然后通过聚类算法对其进行分组。我们的实验表明，由此产生的较小的非终端集正确捕获上下文信息，从而使可以从统计学上显着提高标准HSMT方法的BLEU得分成为可能。

Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs. The standard approach to HSMT learns and apply a synchronous context-free grammar with a single non-terminal. The hypothesis behind the grammar refinement algorithm presented in this work is that this single non-terminal is overloaded, and insufficiently discriminative, and therefore, an adequate split of it into more specialised symbols could lead to improved models. This paper presents a method to learn synchronous context-free grammars with a huge number of initial non-terminals, which are then grouped via a clustering algorithm. Our experiments show that the resulting smaller set of non-terminals correctly capture the contextual information that makes it possible to statistically significantly improve the BLEU score of the standard HSMT approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题