论文标题
在线控制的几何探索
Geometric Exploration for Online Control
论文作者
论文摘要
我们研究\ emph {未知}线性动力系统的控制。目的是最大程度地减少遗憾与扰动反馈控制者的类别,该机构涵盖了所有稳定的线性动力控制器。在这项工作中,我们首先考虑已知成本函数的情况,为此,我们使用$ n^3 \ sqrt {t} $设计了第一个多项式时间算法 - 遗憾的是,$ n $是状态的维度以及控制输入的维度。 $ \ sqrt {t} $ - horizon依赖性是最佳的,并且在先前最著名的$ t^{2/3} $的界面上有所改善。我们算法的主要组成部分是一种新颖的几何探索策略:我们在策略空间中自适应地构建了一系列barycentric跨度。其次,我们考虑了匪徒反馈的情况,为此,我们将第一个多项式时间算法使用$ poly(n)\ sqrt {t} $ - 遗憾的是,基于随机强盗凸出优化。
We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs. the class of disturbance-feedback-controllers, which encompasses all stabilizing linear-dynamical-controllers. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with $n^3\sqrt{T}$-regret, where $n$ is the dimension of the state plus the dimension of control input. The $\sqrt{T}$-horizon dependence is optimal, and improves upon the previous best known bound of $T^{2/3}$. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in the policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with $poly(n)\sqrt{T}$-regret, building on Stochastic Bandit Convex Optimization.