论文标题
食品食谱的多模式烹饪工作流程
Multi-modal Cooking Workflow Construction for Food Recipes
论文作者
论文摘要
了解食物配方需要预测烹饪作用的隐式因果影响,以便将配方转换为描述配方时间工作流程的图。这是一项涉及常识推理的非平凡任务。但是,由于缺乏大规模标记的数据集,现有的努力依靠手工制作的功能从食谱中提取工作流程图。此外,他们无法利用烹饪图像,这构成了食品食谱的重要组成部分。在本文中,我们构建了MM-Res,这是第一个用于烹饪工作流程结构的大型数据集,由9,850种带有人体标记工作流程图的食谱组成。烹饪步骤是多模式的,具有文本说明和烹饪图像。然后,我们提出了一个神经编码器模型,该模型利用视觉和文本信息来构建烹饪工作流,该模型比现有手工制作的基线实现了20%以上的性能增长。
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.