快速，强大的功能选择：自动编码器节能稀疏训练的强度

论文标题

快速，强大的功能选择：自动编码器节能稀疏训练的强度

Quick and Robust Feature Selection: the Strength of Energy-efficient Sparse Training for Autoencoders

论文作者

Atashgahi, Zahra, Sokar, Ghada, van der Lee, Tim, Mocanu, Elena, Mocanu, Decebal Constantin, Veldhuis, Raymond, Pechenizkiy, Mykola

论文摘要

重大并发症是由于最近的高维数据数量的增加，包括高计算成本和内存要求。已引入了数据集的最相关且最有用的属性的功能选择，以解决此问题的解决方案。大多数现有的特征选择方法在计算上效率低下。效率低下的算法会导致高能消耗，对于具有有限的计算和能源的设备而言，这是不可能的。在本文中，提出了一种用于无监督特征选择的新颖而灵活的方法。该方法称为QuickSelection，在稀疏神经网络中引入了神经元的强度，作为衡量特征重要性的标准。该标准与经过稀疏进化训练程序训练的稀疏连接的Denoising自动编码器混合在一起，同时得出了所有输入功能的重要性。我们以纯粹的稀疏方式实施QuickSelection，而不是使用连接模拟稀疏性的二进制掩码的典型方法。它导致大幅度提高和记忆力降低。当在几个基准数据集上进行测试，包括五个低维和三个高维数据集时，在广泛使用的特征选择的方法中，该方法能够实现分类和聚类准确性，运行时间和最大内存使用方法的最佳权衡。此外，我们提出的方法需要最先进的基于自动编码器的特征选择方法之间的能量最少。

Major complications arise from the recent increase in the amount of high-dimensional data, including high computational costs and memory requirements. Feature selection, which identifies the most relevant and informative attributes of a dataset, has been introduced as a solution to this problem. Most of the existing feature selection methods are computationally inefficient; inefficient algorithms lead to high energy consumption, which is not desirable for devices with limited computational and energy resources. In this paper, a novel and flexible method for unsupervised feature selection is proposed. This method, named QuickSelection, introduces the strength of the neuron in sparse neural networks as a criterion to measure the feature importance. This criterion, blended with sparsely connected denoising autoencoders trained with the sparse evolutionary training procedure, derives the importance of all input features simultaneously. We implement QuickSelection in a purely sparse manner as opposed to the typical approach of using a binary mask over connections to simulate sparsity. It results in a considerable speed increase and memory reduction. When tested on several benchmark datasets, including five low-dimensional and three high-dimensional datasets, the proposed method is able to achieve the best trade-off of classification and clustering accuracy, running time, and maximum memory usage, among widely used approaches for feature selection. Besides, our proposed method requires the least amount of energy among the state-of-the-art autoencoder-based feature selection methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题