使用Max K臂Bandit发现材料

论文标题

使用Max K臂Bandit发现材料

Materials Discovery using Max K-Armed Bandit

论文作者

Kikkawa, Nobuaki, Ohno, Hiroshi

论文摘要

搜索匪徒问题的搜索算法适用于材料发现。但是，常规匪徒问题的目标与材料发现的目标不同。常规的匪徒问题旨在最大程度地提高总奖励，而材料发现旨在实现材料特性的突破。旨在获得单一最佳奖励的最大K臂强盗（MKB）问题与发现任务的匹配比传统的强盗更好。因此，在这里，我们使用MKB问题提出了一种基于MKB问题的搜索算法，该材料发现了对最佳奖励的预期改善的上限置信度的伪值。这种方法是伪造的，是不取决于时间范围的渐近门。此外，与其他MKB算法相比，所提出的算法仅具有一个高参数，这在材料发现中是有利的。我们使用蒙特卡洛树搜索将提出的算法应用于综合问题和分子设计。根据结果，在搜索过程的后期无法根据其期望奖励确定MKB的最佳臂时，提出的算法在搜索过程的后期稳定优于其他强盗算法。

Search algorithms for the bandit problems are applicable in materials discovery. However, the objectives of the conventional bandit problem are different from those of materials discovery. The conventional bandit problem aims to maximize the total rewards, whereas materials discovery aims to achieve breakthroughs in material properties. The max K-armed bandit (MKB) problem, which aims to acquire the single best reward, matches with the discovery tasks better than the conventional bandit. Thus, here, we propose a search algorithm for materials discovery based on the MKB problem using a pseudo-value of the upper confidence bound of expected improvement of the best reward. This approach is pseudo-guaranteed to be asymptotic oracles that do not depends on the time horizon. In addition, compared with other MKB algorithms, the proposed algorithm has only one hyperparameter, which is advantageous in materials discovery. We applied the proposed algorithm to synthetic problems and molecular-design demonstrations using a Monte Carlo tree search. According to the results, the proposed algorithm stably outperformed other bandit algorithms in the late stage of the search process when the optimal arm of the MKB could not be determined based on its expectation reward.

下载PDF全文

下载文献需遵守相关版权规定

论文标题