论文标题
学习基于层次短语翻译的多个专门非末端的无上下文语法
Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation
论文作者
论文摘要
基于基于层次短语的统计计算机翻译(HSMT)的翻译模型比某些语言对的非层次短语对应物显示出更好的性能。 HSMT的标准方法学习并应用单个非末端的无上下文语法。这项工作中提出的语法改进算法背后的假设是,这种单端的末端过载,并且不足以判别,因此,将其足够的分为更专业的符号可能会导致改进的模型。本文提出了一种学习具有大量初始非末端的无上下文语法的方法,然后通过聚类算法对其进行分组。我们的实验表明,由此产生的较小的非终端集正确捕获上下文信息,从而使可以从统计学上显着提高标准HSMT方法的BLEU得分成为可能。
Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs. The standard approach to HSMT learns and apply a synchronous context-free grammar with a single non-terminal. The hypothesis behind the grammar refinement algorithm presented in this work is that this single non-terminal is overloaded, and insufficiently discriminative, and therefore, an adequate split of it into more specialised symbols could lead to improved models. This paper presents a method to learn synchronous context-free grammars with a huge number of initial non-terminals, which are then grouped via a clustering algorithm. Our experiments show that the resulting smaller set of non-terminals correctly capture the contextual information that makes it possible to statistically significantly improve the BLEU score of the standard HSMT approach.