更改点检测方法用于在线控制未知时间变化的动态系统

论文标题

更改点检测方法用于在线控制未知时间变化的动态系统

Change Point Detection Approach for Online Control of Unknown Time Varying Dynamical Systems

论文作者

Muthirayan, Deepan, Du, Ruijie, Shen, Yanning, Khargonekar, Pramod P.

论文摘要

我们为未知时间变化的动态系统提供了一种新颖的变更点检测方法，该方法具有完整信息反馈（状态，干扰和成本反馈）的全部信息反馈（状态，干扰和成本反馈）。我们表明，我们的算法可以在干扰行动控制（DAC）策略（DAC）策略上实现次线性遗憾，这些策略是一类广泛研究的政策，用于在线控制动态系统的策略，用于任何子线性的变化和非常通用的系统类别：（i）与一般CONVEX成本功能相匹配的干扰系统（II）具有一般性功能，（II）具有一般性的成本功能。具体来说，对于这些类别的系统，可以实现$γ_t^{1/5} t^{4/5} $的（动态）遗憾，其中$γ_T$是基础系统的变化数量，$ t $是控制剧集的持续时间。也就是说，更改点检测方法对任何子线性数量的更改都产生了子线性遗憾，例如其他以前的算法（例如\ cite {minasyan2021online}}不能。从数值上讲，我们证明了变更点检测方法优于标准重新启动方法\ cite {minasyan2021online}，并且是时间不变的动态系统的标准在线学习方法。我们的工作为未知的时变动力系统提供了第一个遗憾保证，它的变异性概念（例如基础系统的变化数量）。我们的工作向声明和输出反馈控制者的扩展是未来工作的主题。

We propose a novel change point detection approach for online learning control with full information feedback (state, disturbance, and cost feedback) for unknown time-varying dynamical systems. We show that our algorithm can achieve a sub-linear regret with respect to the class of Disturbance Action Control (DAC) policies, which are a widely studied class of policies for online control of dynamical systems, for any sub-linear number of changes and very general class of systems: (i) matched disturbance system with general convex cost functions, (ii) general system with linear cost functions. Specifically, a (dynamic) regret of $Γ_T^{1/5}T^{4/5}$ can be achieved for these class of systems, where $Γ_T$ is the number of changes of the underlying system and $T$ is the duration of the control episode. That is, the change point detection approach achieves a sub-linear regret for any sub-linear number of changes, which other previous algorithms such as in \cite{minasyan2021online} cannot. Numerically, we demonstrate that the change point detection approach is superior to a standard restart approach \cite{minasyan2021online} and to standard online learning approaches for time-invariant dynamical systems. Our work presents the first regret guarantee for unknown time-varying dynamical systems in terms of a stronger notion of variability like the number of changes in the underlying system. The extension of our work to state and output feedback controllers is a subject of future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题