论文标题

在部分观察到的城市尺度环境中,用于对象搜索的空间语言理解

Spatial Language Understanding for Object Search in Partially Observed City-scale Environments

论文作者

Zheng, Kaiyu, Bayazit, Deniz, Mathew, Rebecca, Pavlick, Ellie, Tellex, Stefanie

论文摘要

人类使用空间语言自然描述对象位置及其关系。解释空间语言不仅为机器人增加了感知方式,而且还减少了与人类接口的障碍。先前的工作主要将空间语言视为完全可观察到的域中任务以下任务的目标规范,通常与参考路径配对以进行奖励基于奖励的学习。但是,空间语言本质上是主观的,可能是模棱两可的或误导的。因此,在本文中,我们将空间语言视为随机观察的一种形式。我们提出了Sloop(空间语言面向对象的POMDP),这是一个新的可观察到的决策框架,该框架使用空间语言的概率观察模型。我们将单点应用于城市规模环境中的对象搜索。为了解释模棱两可的,与上下文相关的介词(例如正面),我们设计了一个简单的卷积神经网络,可以预测在给定环境环境的情况下,可以预测语言提供商潜在的参考框架(用于)。搜索策略是通过基于蒙特卡洛树搜索的在线POMDP计划者计算的。基于众包语言数据的评估,在OpenStreetMap的五个城市的区域收集,表明我们的方法可以更快地搜索和更高的成功率,而与基准相比,随着空间语言变得更加复杂,其差距更大。最后,我们演示了Airsim中提出的方法,这是一种现实的模拟器,在该模拟器中,无人机的任务是在邻里环境中找到汽车。

Humans use spatial language to naturally describe object locations and their relations. Interpreting spatial language not only adds a perceptual modality for robots, but also reduces the barrier of interfacing with humans. Previous work primarily considers spatial language as goal specification for instruction following tasks in fully observable domains, often paired with reference paths for reward-based learning. However, spatial language is inherently subjective and potentially ambiguous or misleading. Hence, in this paper, we consider spatial language as a form of stochastic observation. We propose SLOOP (Spatial Language Object-Oriented POMDP), a new framework for partially observable decision making with a probabilistic observation model for spatial language. We apply SLOOP to object search in city-scale environments. To interpret ambiguous, context-dependent prepositions (e.g. front), we design a simple convolutional neural network that predicts the language provider's latent frame of reference (FoR) given the environment context. Search strategies are computed via an online POMDP planner based on Monte Carlo Tree Search. Evaluation based on crowdsourced language data, collected over areas of five cities in OpenStreetMap, shows that our approach achieves faster search and higher success rate compared to baselines, with a wider margin as the spatial language becomes more complex. Finally, we demonstrate the proposed method in AirSim, a realistic simulator where a drone is tasked to find cars in a neighborhood environment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源