deplot：通过绘图到桌子翻译的一声视觉语言推理

论文标题

deplot：通过绘图到桌子翻译的一声视觉语言推理

DePlot: One-shot visual language reasoning by plot-to-table translation

论文作者

Liu, Fangyu, Eisenschlos, Julian Martin, Piccinno, Francesco, Krichene, Syrine, Pang, Chenxi, Lee, Kenton, Joshi, Mandar, Chen, Wenhu, Collier, Nigel, Altun, Yasemin

论文摘要

在人类世界中，诸如图表和情节之类的视觉语言无处不在。理解图和图表需要强大的推理能力。先前的最新模型（SOTA）模型至少需要成千上万的培训示例，其推理能力仍然受到很大的限制，尤其是在复杂的人工编写的查询上。本文介绍了视觉语言推理的第一个单发解决方案。我们将视觉语言推理的挑战分解为两个步骤：（1）绘图到文本翻译，以及（2）对翻译文本的推理。此方法中的关键是模态转换模块，称为Deplot，该模块将图或图表的图像转换为线性化表。然后，删除术的输出可以直接用于提示预验证的大语言模型（LLM），从而利用了LLM的少量推理能力。为了获得删除术，我们通过建立统一的任务格式和指标来标准化情节到桌子的任务，并在此任务上端到端训练deplot。然后可以以插件方式与LLM一起使用Deplot。与在超过28K数据点上进行的SOTA模型相比，从图表QA的任务中，对人为编写的查询的DEPLOT+LLM仅带有一杆，促使SOTA的SOTA提高了24.0％。

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题