论文标题
直通梯度和软阈值是否需要稀疏训练所需的一切?
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?
论文作者
论文摘要
训练神经网络时,将权重变为零有助于降低推理时的计算复杂性。为了逐步提高网络中的稀疏性比而不会在训练过程中引起急剧的重量不连续性,我们的工作结合了软势率和直通梯度的估计,以更新原始重量的原始版本,即零重量。我们的方法以直通/软阈值/稀疏训练的名字命名为ST-3,在单个训练周期中逐渐增加稀疏率时,就以准确性/稀疏性和准确性/拖曳权而获得了SOA结果。特别是,尽管具有简单性,但ST-3与最新方法相比,采用了可区分的表述或受生物启发的神经代理原理。这表明有效稀疏的关键要素主要在于能够使权重的自由在零状态上平稳进化,同时逐步提高稀疏性比。源代码和权重,请访问https://github.com/vanderschuea/stthree
Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree