论文标题

SC2EGSET:Starcraft II电子竞技重播和游戏状态数据集

SC2EGSet: StarCraft II Esport Replay and Game-state Dataset

论文作者

Białecki, Andrzej, Jakubowska, Natalia, Dobrowolski, Paweł, Białecki, Piotr, Krupiński, Leszek, Szczap, Andrzej, Białecki, Robert, Gajewski, Jan

论文摘要

作为一种相对较新的运动形式,电子竞技提供了无与伦比的数据可用性。尽管游戏发动机生成的大量数据,但提取它们并验证其完整性以实用和科学用途的目的是具有挑战性的。 我们的工作旨在通过提供来自Starcraft II电子竞技锦标赛的原始文件和预处理的文件来向更广泛的科学界开放电子竞技。这些文件可用于统计和机器学习建模任务,并与各种基于实验室的测量(例如行为测试,脑成像)相关。我们已经收集了公开可用的游戏发动机,生成了比赛的“重播”,并使用低级应用程序编程接口(API)Parser库进行了数据提取和清理。 此外,我们开源并发布了在创建数据集过程中开发的所有自定义工具。这些工具包括Pytorch和Pytorch Lightning API抽象以加载和建模数据。 我们的数据集包含自2016年以来大型和首映星际争霸II锦标赛的重播。为了准备数据集,我们处理了55个锦标赛“ replaypacks”,其中包含17930个带有游戏状态信息的文件。根据对可用Starcraft II数据集的初步调查,我们观察到我们的数据集是其出版物后最大的Starcraft II电子竞技数据源。 对提取数据的分析有望在各种受监督和自我监督的任务中,对人工智能(AI),机器学习(ML),心理学,人类计算(HCI)和与运动有关的研究有希望。

As a relatively new form of sport, esports offers unparalleled data availability. Despite the vast amounts of data that are generated by game engines, it can be challenging to extract them and verify their integrity for the purposes of practical and scientific use. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and related to various laboratory-based measurements (e.g., behavioral tests, brain imaging). We have gathered publicly available game-engine generated "replays" of tournament matches and performed data extraction and cleanup using a low-level application programming interface (API) parser library. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. To prepare the dataset, we processed 55 tournament "replaypacks" that contained 17930 files with game-state information. Based on initial investigation of available StarCraft II datasets, we observed that our dataset is the largest publicly available source of StarCraft II esports data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源