部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Policy Adaptation from Foundation Model Feedback

论文作者

Ge, Yuying, Macaluso, Annabella, Li, Li Erran, Luo, Ping, Wang, Xiaolong

论文摘要

视觉基础模型的最新进展为建造通用机器人带来了重大进步。通过使用预训练的模型将场景和指令作为决策的输入，指令条件的策略可以跨越不同的对象和任务进行概括。尽管这令人鼓舞，但在大多数情况下，如果考虑到看不见的任务或环境，该政策仍然失败。在这项工作中，我们提出了基础模型反馈（PAFF）的政策调整。在将训练有素的策略部署到新任务或新环境中时，我们首先让该策略随机生成的说明记录了演示。尽管执行可能是错误的，但我们可以使用预训练的基础模型来提供反馈以重新标记演示。这会自动为策略微调提供新的演示指导数据。我们在广泛的实验中评估了我们的方法，重点是对看不见的对象，看不见的任务，看不见的环境和SIM到实现传输的概括。在所有情况下，我们显示PAFF的改善基准都大幅度的利润率。我们的项目页面可从https://geyuying.github.io/paff/获得

Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/

下载PDF全文

下载文献需遵守相关版权规定

论文标题