论文标题
PGD:用于数据驱动分析的大型专业GO数据集
PGD: A Large-scale Professional Go Dataset for Data-driven Analytics
论文作者
论文摘要
李·塞多尔(Lee Sedol)处于连胜状态 - 与阿尔法戈(Alphago)比赛后,这个传奇人物再次崛起? Ke Jie在世界冠军赛中是无敌的 - 这次他仍然赢得冠军? GO是东亚最受欢迎的棋盘游戏之一,其稳定的职业体育系统已经在中国,日本和韩国持续了数十年。许多运动,例如足球,篮球和电子竞技,都有成熟的数据驱动分析技术。但是,由于缺乏数据集,元信息和游戏内统计数据,为GO开发这种技术仍然是无聊和具有挑战性的。本文创建了专业的GO数据集(PGD),其中包含由1950年至2021年的2,148名专业玩家玩的98,043场游戏。在手动清洁和标签后,我们为每个玩家,游戏和比赛提供详细的元信息。此外,数据集包括由基于Advance Alphazero AI评估的比赛中每个举动的分析结果。为了建立PGD的基准,我们根据与可以指示游戏状态相关的先验知识来进一步分析数据并提取有意义的游戏内功能。借助完整的元信息和构建游戏内功能,我们的结果预测系统的准确度为75.30%,远高于几种最新方法(64%-65%)。据我们所知,PGD是GO甚至棋盘游戏中数据驱动分析的第一个数据集。除了这个有希望的结果之外,我们还提供了更多从数据集中受益的任务的示例。本文的最终目标是桥接这款古老的游戏和现代数据科学界。它将提高对与GO相关的分析的研究,以增强粉丝的体验,帮助玩家提高能力并促进其他有希望的方面。该数据集将公开可用。
Lee Sedol is on a winning streak--does this legend rise again after the competition with AlphaGo? Ke Jie is invincible in the world championship--can he still win the title this time? Go is one of the most popular board games in East Asia, with a stable professional sports system that has lasted for decades in China, Japan, and Korea. There are mature data-driven analysis technologies for many sports, such as soccer, basketball, and esports. However, developing such technology for Go remains nontrivial and challenging due to the lack of datasets, meta-information, and in-game statistics. This paper creates the Professional Go Dataset (PGD), containing 98,043 games played by 2,148 professional players from 1950 to 2021. After manual cleaning and labeling, we provide detailed meta-information for each player, game, and tournament. Moreover, the dataset includes analysis results for each move in the match evaluated by advanced AlphaZero-based AI. To establish a benchmark for PGD, we further analyze the data and extract meaningful in-game features based on prior knowledge related to Go that can indicate the game status. With the help of complete meta-information and constructed in-game features, our results prediction system achieves an accuracy of 75.30%, much higher than several state-of-the-art approaches (64%-65%). As far as we know, PGD is the first dataset for data-driven analytics in Go and even in board games. Beyond this promising result, we provide more examples of tasks that benefit from our dataset. The ultimate goal of this paper is to bridge this ancient game and the modern data science community. It will advance research on Go-related analytics to enhance the fan experience, help players improve their ability, and facilitate other promising aspects. The dataset will be made publicly available.