展示，召回和讲述：带有召回机制的图像字幕

论文标题

展示，召回和讲述：带有召回机制的图像字幕

Show, Recall, and Tell: Image Captioning with Recall Mechanism

论文作者

Wang, Li, Bai, Zechen, Zhang, Yonghua, Lu, Hongtao

论文摘要

在图像上限限制中生成自然而准确的描述一直是一个挑战。在本文中，我们实现了一种新颖的召回机制，以模仿人类连接字幕的方式。我们的召回机甲 - 纳米主义中有三个部分：召回单元，语义指南（SG）和召回字样（RWS）。召回单元是设计用于检索图像的召回单词的文本回程模块。 SG和RWS被签名，以最大程度地利用被召回的单词。 SG分支可以赢得召回的上下文，该上下文可以指导加电标题的过程。 RWS分支机构负责将其复制为标题复制。受文本摘要中指向机械态的启发，我们采用了一个软转换，以在SG和RWS之间的Balancethe生成的单词概率。在苹果酒的优化步骤中，我们还引入了个人重新授权的奖励（WR）来增强培训。我们的拟议方法（SG+RWS+WR）实现了36.6 / 116.9 / 21.3的BLEU-4 / CIDER / SPICESCORES，具有跨渗透损失，38.7 /129.1 / 22.4在MscocoCoco Karpathytest上进行了cider的优化，以超过其他状态的the-Art-Art-Arthartmetsmetsmetsmetsmetsmetsmethods。

Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题