神经网络的基于树的稀疏初始化

论文标题

神经网络的基于树的稀疏初始化

Sparse tree-based initialization for neural networks

论文作者

Lutz, Patrick, Arnould, Ludovic, Boyer, Claire, Scornet, Erwan

论文摘要

专用的神经网络（NN）体系结构已设计为处理特定的数据类型（例如图像的CNN或文本的RNN），这将它们排名为处理这些数据的最新方法。不幸的是，尚未发现用于处理表格数据的架构，对于哪种树集合方法（树木增强，随机森林）通常显示出最佳的预测性能。在这项工作中，我们为（潜在的）多层感知器（MLP）提出了一种新的稀疏初始化技术：我们首先训练基于树的程序来检测特征相互作用，并使用结果信息来初始化网络，随后通过标准的随机梯度训练该网络。几个表格数据集的数值实验表明，这种新的，简单且易于使用的方法在概括能力和计算时间方面都是可靠的，默认的MLP初始化，甚至是现有的复杂深度学习解决方案。实际上，这种明智的MLP初始化将所得的NN方法提高到有效竞争者的水平，以在处理表格数据时提高渐变。此外，这种初始化能够通过培训来保留网络第一层中引入的权重的稀疏性。这一事实表明，这种新的初始化器在NN培训期间运行一个隐式正则化，并强调第一层是稀疏的特征提取器（与CNN中的卷积层）。

Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data. Unfortunately, no architecture has been found for dealing with tabular data yet, for which tree ensemble methods (tree boosting, random forests) usually show the best predictive performances. In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. Numerical experiments on several tabular data sets show that this new, simple and easy-to-use method is a solid concurrent, both in terms of generalization capacity and computation time, to default MLP initialization and even to existing complex deep learning solutions. In fact, this wise MLP initialization raises the resulting NN methods to the level of a valid competitor to gradient boosting when dealing with tabular data. Besides, such initializations are able to preserve the sparsity of weights introduced in the first layers of the network through training. This fact suggests that this new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor (as for convolutional layers in CNN).

下载PDF全文

下载文献需遵守相关版权规定

论文标题