使用整洁和加强学习无限期地玩2D游戏

论文标题

使用整洁和加强学习无限期地玩2D游戏

Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning

论文作者

Selvan, Jerin Paul, Game, Pravin S.

论文摘要

十多年来，机器人技术和人造代理的使用已成为普遍的事物。测试新路径查找或搜索空间优化算法的性能也已成为一个挑战将根据其所采用环境中算法行为的特工可以进行绩效参数，代理可以如何迅速区分奖励的行动和敌对行动，可以通过将代理放置在具有不同类型的障碍的环境中来测试，而这些障碍的目标以及代理人的目标是避免了所有范围的范围，避免了所有范围的行动。游戏是要使鸟飞过一组随机高度。鸟必须在这些管道之间进行，不得击中顶部，底部或管道本身。鸟类可以采取的行动要么是为了使翅膀绑住翅膀，要么用重力下降。算法在人造剂量上强制施加到人造剂量的态度，是态度的神经进化的态度（态度）。人造代理人的人口。他们通过考虑目标功能，交叉，突变和增强拓扑结构来遵循遗传算法。另一方面，提示学习纪念状态，在该状态下采取的行动，以及使用单个代理商采取的动作获得的奖励，并使用单个代理和深度Q-learning网络采取的行动。

For over a decade now, robotics and the use of artificial agents have become a common thing.Testing the performance of new path finding or search space optimization algorithms has also become a challenge as they require simulation or an environment to test them.The creation of artificial environments with artificial agents is one of the methods employed to test such algorithms.Games have also become an environment to test them.The performance of the algorithms can be compared by using artificial agents that will behave according to the algorithm in the environment they are put in.The performance parameters can be, how quickly the agent is able to differentiate between rewarding actions and hostile actions.This can be tested by placing the agent in an environment with different types of hurdles and the goal of the agent is to reach the farthest by taking decisions on actions that will lead to avoiding all the obstacles.The environment chosen is a game called "Flappy Bird".The goal of the game is to make the bird fly through a set of pipes of random heights.The bird must go in between these pipes and must not hit the top, the bottom, or the pipes themselves.The actions that the bird can take are either to flap its wings or drop down with gravity.The algorithms that are enforced on the artificial agents are NeuroEvolution of Augmenting Topologies (NEAT) and Reinforcement Learning.The NEAT algorithm takes an "N" initial population of artificial agents.They follow genetic algorithms by considering an objective function, crossover, mutation, and augmenting topologies.Reinforcement learning, on the other hand, remembers the state, the action taken at that state, and the reward received for the action taken using a single agent and a Deep Q-learning Network.The performance of the NEAT algorithm improves as the initial population of the artificial agents is increased.

下载PDF全文

下载文献需遵守相关版权规定

论文标题