论文标题

视觉问题回答是多任务问题

Visual Question Answering as a Multi-Task Problem

论文作者

Pollard, Amelia Elizabeth, Shapiro, Jonathan L.

论文摘要

视觉问题回答(VQA)是一个高度复杂的问题集,依靠许多子问题来产生合理的答案。在本文中,我们提出了以下假设:视觉问题应视为一个多任务问题,并提供了支持这一假设的证据。我们通过重新格式化两个常用的视觉问题回答可可qa和daquar的数据集为多任务格式,并在两个基线网络上训练这些重新格式化的数据集,其中一个专门设计,旨在消除由于重新格式化而导致的其他可能原因。尽管本文所证明的网络没有取得强烈的竞争成果,但我们发现,视觉问题回答的多任务方法的结果可导致5-9%对单任务格式的绩效提高,并且网络的收敛速度比单件任务案例快得多。最后,我们讨论了观察到的性能差异的可能原因,并执行其他实验,这些实验排除了与学习数据集作为多任务问题无关的原因。

Visual Question Answering(VQA) is a highly complex problem set, relying on many sub-problems to produce reasonable answers. In this paper, we present the hypothesis that Visual Question Answering should be viewed as a multi-task problem, and provide evidence to support this hypothesis. We demonstrate this by reformatting two commonly used Visual Question Answering datasets, COCO-QA and DAQUAR, into a multi-task format and train these reformatted datasets on two baseline networks, with one designed specifically to eliminate other possible causes for performance changes as a result of the reformatting. Though the networks demonstrated in this paper do not achieve strongly competitive results, we find that the multi-task approach to Visual Question Answering results in increases in performance of 5-9% against the single-task formatting, and that the networks reach convergence much faster than in the single-task case. Finally we discuss possible reasons for the observed difference in performance, and perform additional experiments which rule out causes not associated with the learning of the dataset as a multi-task problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源