记忆专注融合：基于变压器的顺序到序列模型的外部语言模型集成

论文标题

记忆专注融合：基于变压器的顺序到序列模型的外部语言模型集成

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

论文作者

Ihori, Mana, Masumura, Ryo, Makishima, Naoki, Tanaka, Tomohiro, Takashima, Akihiko, Orihashi, Shota

论文摘要

本文提出了一种新颖的融合方法，用于将外部语言模型（LM）集成到基于变压器的序列到序列（SEQ2SEQ）模型中。基本上需要配对数据才能训练SEQ2SEQ模型，但外部LM只能使用未配对的数据进行训练。因此，重要的是要利用外部LM中的记忆知识来构建SEQ2SEQ模型，因为很难准备大量的配对数据。但是，现有的融合方法假定LM与基于神经网络的SEQ2SEQ模型而不是变压器集成在一起。因此，本文提出了一种融合方法，可以在变压器中明确利用网络结构。所提出的方法称为{\ bf存储器专注的融合}，利用了变压器式的注意机制，该机制以多跳的方式重复源靶向注意力，以阅读LM中的记忆知识。我们对两个文本式转换任务的实验表明，所提出的方法的性能比常规融合方法更好。

This paper presents a novel fusion method for integrating an external language model (LM) into the Transformer based sequence-to-sequence (seq2seq) model. While paired data are basically required to train the seq2seq model, the external LM can be trained with only unpaired data. Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data. However, the existing fusion methods assume that the LM is integrated with recurrent neural network-based seq2seq models instead of the Transformer. Therefore, this paper proposes a fusion method that can explicitly utilize network structures in the Transformer. The proposed method, called {\bf memory attentive fusion}, leverages the Transformer-style attention mechanism that repeats source-target attention in a multi-hop manner for reading the memorized knowledge in the LM. Our experiments on two text-style conversion tasks demonstrate that the proposed method performs better than conventional fusion methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题