论文标题
地牢和数据:一个大规模的Nethack数据集
Dungeons and Data: A Large-Scale NetHack Dataset
论文作者
论文摘要
代理商开发的最新突破,以解决诸如GO,Starcraft或dota之类的具有挑战性的顺序决策问题,都依赖于模拟环境和大规模数据集。但是,这项研究的进展受到了开源数据集的稀缺性以及与之合作的高度计算成本的阻碍。在这里,我们介绍了NETHACK Learning数据集(NLD),这是一个大流行的Nethack游戏中的大型且高度可观的轨迹数据集,这既是当前方法非常具有挑战性的,又非常迅速地运行。 NLD由三个部分组成:从2009年至2020年在NAO Public Nethack Server上收集的150万个人类轨迹的100亿个州过渡;从2021年Nethack Challenge的象征性机器人冠军中收集的100,000个轨迹进行了30亿个国家行动得分的过渡;并且,随附的代码供用户以高度压缩的形式记录,加载和流式传输此类轨迹的任何集合。我们评估了包括在线和离线RL在内的各种现有算法,以及从演示中学习,这表明需要进行重大研究进展,以充分利用大型数据集,以挑战顺序决策任务。
Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks.