论文标题

展示,召回和讲述:带有召回机制的图像字幕

Show, Recall, and Tell: Image Captioning with Recall Mechanism

论文作者

Wang, Li, Bai, Zechen, Zhang, Yonghua, Lu, Hongtao

论文摘要

在图像上限限制中生成自然而准确的描述一直是一个挑战。在本文中,我们实现了一种新颖的召回机制,以模仿人类连接字幕的方式。我们的召回机甲 - 纳米主义中有三个部分:召回单元,语义指南(SG)和召回字样(RWS)。召回单元是设计用于检索图像的召回单词的文本回程模块。 SG和RWS被签名,以最大程度地利用被召回的单词。 SG分支可以赢得召回的上下文,该上下文可以指导加电标题的过程。 RWS分支机构负责将其复制为标题复制。受文本摘要中指向机械态的启发,我们采用了一个软转换,以在SG和RWS之间的Balancethe生成的单词概率。在苹果酒的优化步骤中,我们还引入了个人重新授权的奖励(WR)来增强培训。我们的拟议方法(SG+RWS+WR)实现了36.6 / 116.9 / 21.3的BLEU-4 / CIDER / SPICESCORES,具有跨渗透损失,38.7 /129.1 / 22.4在MscocoCoco Karpathytest上进行了cider的优化,以超过其他状态的the-Art-Art-Arthartmetsmetsmetsmetsmetsmetsmethods。

Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源