论文标题

激光混乱时间序列中的多臂强盗问题中的手臂订单识别

Arm order recognition in multi-armed bandit problem with laser chaos time series

论文作者

Narisawa, Naoki, Chauvet, Nicolas, Hasegawa, Mikio, Naruse, Makoto

论文摘要

通过利用具有延迟反馈的激光器产生的超快和不规则时间序列,我们先前已经证明了一种可扩展的算法,以解决使用激光混乱时间序列的时间分割多路复用的多臂匪徒(MAB)问题。尽管该算法检测到具有最高奖励期望的手臂,但在奖励期望方面对武器顺序的正确认可是无法实现的。在这里,我们提出了一种算法,其中根据置信区间自适应地控制探索程度,该算法代表奖励期望的估计准确性。我们已经在数值上证明,我们的方法确实可以显着提高手臂识别的准确性,并降低对奖励环境的依赖,并且与常规的mAB方法相比,总奖励几乎可以维持。这项研究适用于订单信息至关重要的部门,例如在信息和通信技术中有效分配资源。

By exploiting ultrafast and irregular time series generated by lasers with delayed feedback, we have previously demonstrated a scalable algorithm to solve multi-armed bandit (MAB) problems utilizing the time-division multiplexing of laser chaos time series. Although the algorithm detects the arm with the highest reward expectation, the correct recognition of the order of arms in terms of reward expectations is not achievable. Here, we present an algorithm where the degree of exploration is adaptively controlled based on confidence intervals that represent the estimation accuracy of reward expectations. We have demonstrated numerically that our approach did improve arm order recognition accuracy significantly, along with reduced dependence on reward environments, and the total reward is almost maintained compared with conventional MAB methods. This study applies to sectors where the order information is critical, such as efficient allocation of resources in information and communications technology.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源