论文标题
在移动益智游戏中使用近端政策优化的策略
Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games
论文作者
论文摘要
尽管传统上,劳动力密集的任务,但游戏内容的测试逐渐变得越来越自动化。在这种自动化形成的许多方向中,自动播放测试是最有希望的,也归功于许多监督和强化学习(RL)算法的进步。但是,由于其培训和使用方面的可靠性和透明度,这些类型的算法虽然非常强大,但经常在生产环境中受苦。 在这项研究工作中,我们正在调查和评估策略,以在休闲移动益智游戏中应用流行的RL方法近端政策优化(PPO),其特定重点是提高其在游戏期间的训练和概括性的可靠性。 我们已经针对现实世界移动益智游戏(触觉游戏的Lily's Garden)实施并测试了许多不同的策略。我们隔离了导致测试过程中训练或泛化失败的条件,并确定了一些策略,以确保该游戏类型中算法的行为更稳定。
While traditionally a labour intensive task, the testing of game content is progressively becoming more automated. Among the many directions in which this automation is taking shape, automatic play-testing is one of the most promising thanks also to advancements of many supervised and reinforcement learning (RL) algorithms. However these type of algorithms, while extremely powerful, often suffer in production environments due to issues with reliability and transparency in their training and usage. In this research work we are investigating and evaluating strategies to apply the popular RL method Proximal Policy Optimization (PPO) in a casual mobile puzzle game with a specific focus on improving its reliability in training and generalization during game playing. We have implemented and tested a number of different strategies against a real-world mobile puzzle game (Lily's Garden from Tactile Games). We isolated the conditions that lead to a failure in either training or generalization during testing and we identified a few strategies to ensure a more stable behaviour of the algorithm in this game genre.