操作：语言驱动的门和抽屉开口的基准

论文标题

操作：语言驱动的门和抽屉开口的基准

OpenD: A Benchmark for Language-Driven Door and Drawer Opening

论文作者

Zhao, Yizhou, Gao, Qiaozi, Qiu, Liang, Thattai, Govind, Sukhatme, Gaurav S.

论文摘要

我们介绍了Opend，这是学习如何使用手在由语言教学驱动的光照片和物理可靠的模拟环境中使用手打开橱柜门或抽屉的基准。为了解决任务，我们提出了一个由深神经网络和规则基础控制器组成的多步规划师。该网络可用于从图像中捕获空间关系，并从语言说明中理解语义含义。控制器根据空间和语义理解有效地执行计划。我们通过测量其在测试数据集中的零弹性性能来评估我们的系统。实验结果证明了我们多步规划师对不同手的决策计划的有效性，同时暗示有很大的空间可以开发出更好的模型来应对语言理解，空间推理和长期操纵带来的挑战。我们将发布OPEND和主持挑战，以促进该领域的未来研究。

We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction. To solve the task, we propose a multi-step planner composed of a deep neural network and rule-base controllers. The network is utilized to capture spatial relationships from images and understand semantic meaning from language instructions. Controllers efficiently execute the plan based on the spatial and semantic understanding. We evaluate our system by measuring its zero-shot performance in test data set. Experimental results demonstrate the effectiveness of decision planning by our multi-step planner for different hands, while suggesting that there is significant room for developing better models to address the challenge brought by language understanding, spatial reasoning, and long-term manipulation. We will release OPEND and host challenges to promote future research in this area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题