论文标题
图形分类的数据增强
Data Augmentation for Graph Classification
论文作者
论文摘要
旨在识别图的类别标签的图形分类在药物分类,毒性检测,蛋白质分析等中起重要作用。但是,基准数据集规模的限制使图形分类模型易于陷入过度拟合和范围不足。为此,我们介绍了图形上的数据增强,并介绍了两种启发式算法:随机映射和图案相似度映射,以通过图形结构的启发式修改为小规模基准数据集生成更弱标记的数据。此外,我们提出了一个通用模型演化框架M-Evolve,该框架结合了图扩大,数据过滤和模型重新培训以优化预训练的图形分类器。在六个基准数据集上进行的实验表明,在小规模基准数据集训练时,M-Evolve有助于现有的图形分类模型减轻过度拟合,并在图形分类任务上平均提高3-12%的精度。
Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale of benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. Towards this, we introduce data augmentation on graphs and present two heuristic algorithms: random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic modification of graph structures. Furthermore, we propose a generic model evolution framework, M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments conducted on six benchmark datasets demonstrate that M-Evolve helps existing graph classification models alleviate over-fitting when training on small-scale benchmark datasets and yields an average improvement of 3-12% accuracy on graph classification tasks.