论文标题

改善了零订单对抗性匪徒凸优化的遗憾

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

论文作者

Lattimore, Tor

论文摘要

我们证明,零级遗憾的信息理论上界的零级遗憾最多是$ o(d^{2.5} \ sqrt {n} \ log(n))$,其中$ d $是尺寸,$ n $是交互的数量。这在$ o(d^{9.5} \ sqrt {n} \ log(n)^{7.5} $(2017)(2017年)上。

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源