安全主动动态学习和控制：一个顺序探索 - 开发框架

论文标题

安全主动动态学习和控制：一个顺序探索 - 开发框架

Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework

论文作者

Lew, Thomas, Sharma, Apoorva, Harrison, James, Bylard, Andrew, Pavone, Marco

论文摘要

在各种情况下，安全部署自主机器人需要能够有效地适应新环境的同时满足约束的代理。在这项工作中，我们提出了一种实用和理论上的方法，以在存在动态不确定性的情况下保持安全性。我们的方法利用最后一层适应来利用贝叶斯元学习。神经网络的表现力具有离线训练的训练有素，还具有高效的最后层在线改编，可以推导紧密的置信度集合，这些置信度围绕在线适应在线时，这些置信度围绕着真正的动态。我们利用这些信心设置来计划保证系统安全的轨迹。我们的方法处理高动态不确定性的问题，在该问题中，首先\ textIt {探索}可以安全地达到目标是不可行的，以收集数据并减少不确定性，然后自主\ textit {exploing}获取的信息以安全地执行任务。在合理的假设下，我们证明我们的框架可以始终共同保证所有约束的高概率满意度，即在整个任务持续时间内。这种理论分析还激发了两个正规化的最后一层元学习模型，这些模型通过减少置信度集的大小来提高在线适应能力以及性能。我们广泛地展示了我们在模拟和硬件方面的方法。

Safe deployment of autonomous robots in diverse scenarios requires agents that are capable of efficiently adapting to new environments while satisfying constraints. In this work, we propose a practical and theoretically-justified approach to maintaining safety in the presence of dynamics uncertainty. Our approach leverages Bayesian meta-learning with last-layer adaptation. The expressiveness of neural-network features trained offline, paired with efficient last-layer online adaptation, enables the derivation of tight confidence sets which contract around the true dynamics as the model adapts online. We exploit these confidence sets to plan trajectories that guarantee the safety of the system. Our approach handles problems with high dynamics uncertainty, where reaching the goal safely is potentially initially infeasible, by first \textit{exploring} to gather data and reduce uncertainty, before autonomously \textit{exploiting} the acquired information to safely perform the task. Under reasonable assumptions, we prove that our framework guarantees the high-probability satisfaction of all constraints at all times jointly, i.e. over the total task duration. This theoretical analysis also motivates two regularizers of last-layer meta-learning models that improve online adaptation capabilities as well as performance by reducing the size of the confidence sets. We extensively demonstrate our approach in simulation and on hardware.

下载PDF全文

下载文献需遵守相关版权规定

论文标题