论文标题
修剪过滤器,同时培训有效地优化深度学习网络
Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks
论文作者
论文摘要
现代深层网络具有数百万到数十亿到数十亿到数十亿的参数,这在训练期间以及推断资源受限的边缘设备期间都会导致高内存和能量需求。因此,已经提出了修剪技术,以消除深层网络中的显着权重较少,从而减少了它们的内存和计算要求。通常在训练原始网络后进行修剪,然后进行进一步的再培训,以补偿修剪过程中产生的准确性损失。迭代地重复修复修剪和替代程序,直到达到准确性和效率之间的最佳权衡为止。但是,这种迭代性重新培训增加了网络的整体培训复杂性。在这项工作中,我们提出了一种动态的修剪训练程序,其中我们在训练本身过程中修剪深网的卷积层过滤器,从而排除了单独的再训练的需求。我们通过三种不同的先前的修剪策略评估我们的动态修剪训练方法,即。基于平均激活的修剪,随机修剪和基于L1归一化的修剪。我们对在CIFAR10培训的VGG-16的结果表明,与原始网络相比,在修剪80%的过滤器后,L1归一化在这项工作中探索的所有技术中提供了最佳性能,精度下降了不到1%。我们进一步评估了基于CIFAR100的基于L1归一化的修剪机制。结果表明,与原始网络相比,修剪在修剪50%的过滤器后几乎没有准确性损失,而高修剪率损失约为5%(> 80%),几乎没有准确的损失。与10个时期的训练相比,所提出的修剪方法在CIFAR10,CIFAR100和IMAGENET训练期间的计算和内存访问数量减少了41%。
Modern deep networks have millions to billions of parameters, which leads to high memory and energy requirements during training as well as during inference on resource-constrained edge devices. Consequently, pruning techniques have been proposed that remove less significant weights in deep networks, thereby reducing their memory and computational requirements. Pruning is usually performed after training the original network, and is followed by further retraining to compensate for the accuracy loss incurred during pruning. The prune-and-retrain procedure is repeated iteratively until an optimum tradeoff between accuracy and efficiency is reached. However, such iterative retraining adds to the overall training complexity of the network. In this work, we propose a dynamic pruning-while-training procedure, wherein we prune filters of the convolutional layers of a deep network during training itself, thereby precluding the need for separate retraining. We evaluate our dynamic pruning-while-training approach with three different pre-existing pruning strategies, viz. mean activation-based pruning, random pruning, and L1 normalization-based pruning. Our results for VGG-16 trained on CIFAR10 shows that L1 normalization provides the best performance among all the techniques explored in this work with less than 1% drop in accuracy after pruning 80% of the filters compared to the original network. We further evaluated the L1 normalization based pruning mechanism on CIFAR100. Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters compared to the original network and ~5% loss for high pruning rates (>80%). The proposed pruning methodology yields 41% reduction in the number of computations and memory accesses during training for CIFAR10, CIFAR100 and ImageNet compared to training with retraining for 10 epochs .