深层自动编码器重量的扰动，用于模型压缩和表格数据的分类

论文标题

深层自动编码器重量的扰动，用于模型压缩和表格数据的分类

Perturbation of Deep Autoencoder Weights for Model Compression and Classification of Tabular Data

论文作者

Samad, Manar, Abrar, Sakib

论文摘要

完全连接的深神经网络（DNN）通常包括冗余重量，导致过度拟合和高内存要求。此外，在表格数据分类中，传统的机器学习模型通常会挑战DNN的性能。在本文中，我们提出了DNN权重的定期扰动（修剪和重生），尤其是在深度培训的深度自动编码器的预训练阶段。提出的重量扰动策略在下游分类任务中的六个表格数据集中的四分之四都优于辍学学习。与辍学或我们的体重扰动常规相比，在相同预处理阶段的L1或L2重量正规化会导致分类的下等性能。与辍学学习不同，拟议的重量扰动常规另外可在六个表格数据集中达到15％至40％的稀疏度，以压缩深度预审计的模型。我们的实验表明，当完全连接的DNN失败时，用重量扰动或辍学的经过验证的深度自动编码器可以在表格数据分类中胜过传统的机器学习。但是，当表格数据集包含不相关的变量时，传统的机器学习模型似乎优于任何深层模型。因此，深层模型的成功可以归因于实际数据集中不可避免地存在相关变量。

Fully connected deep neural networks (DNN) often include redundant weights leading to overfitting and high memory requirements. Additionally, the performance of DNN is often challenged by traditional machine learning models in tabular data classification. In this paper, we propose periodical perturbations (prune and regrow) of DNN weights, especially at the self-supervised pre-training stage of deep autoencoders. The proposed weight perturbation strategy outperforms dropout learning in four out of six tabular data sets in downstream classification tasks. The L1 or L2 regularization of weights at the same pretraining stage results in inferior classification performance compared to dropout or our weight perturbation routine. Unlike dropout learning, the proposed weight perturbation routine additionally achieves 15% to 40% sparsity across six tabular data sets for the compression of deep pretrained models. Our experiments reveal that a pretrained deep autoencoder with weight perturbation or dropout can outperform traditional machine learning in tabular data classification when fully connected DNN fails miserably. However, traditional machine learning models appear superior to any deep models when a tabular data set contains uncorrelated variables. Therefore, the success of deep models can be attributed to the inevitable presence of correlated variables in real-world data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题