CLEVR-X：自然语言解释的视觉推理数据集

论文标题

CLEVR-X：自然语言解释的视觉推理数据集

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

论文作者

Salewski, Leonard, Koepke, A. Sophia, Lensch, Hendrik P. A., Akata, Zeynep

论文摘要

在视觉问题回答（VQA）的背景下提供解释提出了机器学习中的基本问题。为了获得有关VQA生成自然语言解释过程的详细见解，我们介绍了大规模的CLEVR-X数据集，该数据集通过自然语言解释扩展了CLEVR数据集。对于CLEVR数据集中的每个图像问题对，CLEVR-X包含多个结构化的文本说明，这些解释是从原始场景图中得出的。通过构造，CLEVR-X解释是正确的，并描述回答给定问题所需的推理和视觉信息。我们进行了一项用户研究，以确认我们拟议的数据集中的基础真相解释确实是完整且相关的。我们提出了使用CLEVR-X数据集上的两个最新框架在VQA上生成自然语言解释的基线结果。此外，我们为不同的问答类型提供了解释生成质量的详细分析。此外，我们研究了使用不同数量的基础解释对自然语言产生（NLG）指标收敛的影响。 CLEVR-X数据集可在\ url {https://explainableml.github.io/clevr-x/}上公开获得。

Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at \url{https://explainableml.github.io/CLEVR-X/}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题