论文标题
BABD:一个比特币地址行为数据集用于模式分析
BABD: A Bitcoin Address Behavior Dataset for Pattern Analysis
论文作者
论文摘要
由于主流应用的采用量的增加,加密货币不再仅仅是黑暗网络上网络犯罪活动的首选选择。这部分是由于与基础分类帐相关的透明度,任何个人都可以在公共分类帐中访问交易记录的记录。 In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data, which is the largest labeled Bitcoin address behavior dataset publicly available to our knowledge.然后,我们在常见的机器学习模型上使用建议的数据集,即:K-Nearest邻居算法,决策树,随机森林,多层感知器和XGBoost。结果表明,这些机器学习模型在我们提出的数据集上的多分类任务的准确率在93.24%至97.13%之间。我们还分析了从实验中分析提出的特征及其关系,并提出了K-HOP子图生成算法,以从由定向的异质性多币从特定的比特币地址节点开始的整个比特币交易图中提取K-HOP子图(例如,已知的已知交易与犯罪研究相关联)。此外,我们最初根据提取的特征分析了不同类型的比特币地址的行为模式。
Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data, which is the largest labeled Bitcoin address behavior dataset publicly available to our knowledge. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models for the multi-classification task on our proposed dataset are between 93.24% and 97.13%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation). Besides, we initially analyze the behavior patterns of different types of Bitcoin addresses according to the extracted features.