为了改善非机构MDP的安全政策

论文标题

为了改善非机构MDP的安全政策

Towards Safe Policy Improvement for Non-Stationary MDPs

论文作者

Chandak, Yash, Jordan, Scott M., Theocharous, Georgios, White, Martha, Thomas, Philip S.

论文摘要

许多现实世界的顺序决策问题涉及具有财务风险和人类生活风险的关键系统。尽管过去有几项作品提出了可以安全部署的方法，但他们认为潜在的问题是静止的。但是，许多感兴趣的现实世界问题表现出非平稳性，而当赌注很高时，与错误的平稳性假设相关的成本可能是不可接受的。我们迈出了第一步，以高度信心确保安全，以解决平稳的非平稳决策问题。我们提出的方法通过与时间序列分析的无模型增强学习的合成，扩展了一种称为Seldonian算法的安全算法，称为Seldonian算法。使用策略预测性能的顺序假设检验确保安全性，并使用野生引导程序获得置信区间。

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.

下载PDF全文

下载文献需遵守相关版权规定

论文标题