使用加固学习在图上查找大地学

论文标题

使用加固学习在图上查找大地学

Finding geodesics on graphs using reinforcement learning

论文作者

Kious, Daniel, Mailler, Cécile, Schapira, Bruno

论文摘要

在生物学上众所周知，蚂蚁能够通过连续的随机探索在巢和食物之间找到最短的路径，而没有任何交流的含义，而除了他们留下的信息素外。这种引人注目的现象已经通过实验观察到，并通过生物学文献中不同的平均田间增强学习模型进行了建模。在本文中，我们介绍了该现象的第一个概率增强学习模型。在此模型中，蚂蚁探索了一个有限图，其中两个节点被区分为巢和食物来源。蚂蚁在此图上进行连续的随机步行，从巢开始，然后在第一次到达食物时停止，每个随机步行的过渡概率取决于所有以前所有步行的实现，都可以通过图形的某些动态加权。我们基于不同的强化规则讨论了该模型的不同变体，并表明该强化规则的微小变化可能导致截然不同的结果。我们证明，在该模型的两个变体中，当基础图分别是任何串联平行图和5边非系列 - 平行的LoSange图时，蚂蚁确实最终找到了巢和食物之间的最短路径。这两个证据都依赖于电网方法在加权图上随机行走，并且在连续的时间内将鲁宾嵌入。串联案例中的证明使用该图家族的递归性质，而在看似减少的Losange情况下的证明证明是非常复杂的：它依赖于对某些随机近似的精细分析，以及与标准和广义的Pólyaurns的各种耦合。

It is well-known in biology that ants are able to find shortest paths between their nest and the food by successive random explorations, without any mean of communication other than the pheromones they leave behind them. This striking phenomenon has been observed experimentally and modelled by different mean-field reinforcement-learning models in the biology literature. In this paper, we introduce the first probabilistic reinforcement-learning model for this phenomenon. In this model, the ants explore a finite graph in which two nodes are distinguished as the nest and the source of food. The ants perform successive random walks on this graph, starting from the nest and stopped when first reaching the food, and the transition probabilities of each random walk depend on the realizations of all previous walks through some dynamic weighting of the graph. We discuss different variants of this model based on different reinforcement rules and show that slight changes in this reinforcement rule can lead to drastically different outcomes. We prove that, in two variants of this model and when the underlying graph is, respectively, any series-parallel graph and a 5-edge non-series-parallel losange graph, the ants indeed eventually find the shortest path(s) between their nest and the food. Both proofs rely on the electrical network method for random walks on weighted graphs and on Rubin's embedding in continuous time. The proof in the series-parallel cases uses the recursive nature of this family of graphs, while the proof in the seemingly-simpler losange case turns out to be quite intricate: it relies on a fine analysis of some stochastic approximation, and on various couplings with standard and generalised Pólya urns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题