KSM：通过内核软面膜学习快速多次任务改编

论文标题

KSM：通过内核软面膜学习快速多次任务改编

KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning

论文作者

Yang, Li, He, Zhezhi, Zhang, Junshan, Fan, Deliang

论文摘要

深度神经网络（DNN）可能会忘记学习新任务时关于早期任务的知识，这被称为\ textit {灾难性遗忘}。尽管最近的持续学习方法能够减轻玩具大小的数据集上的灾难性问题，但在将它们应用于现实世界中的问题时仍有待解决的问题。最近，提出了基于快速掩模的学习方法（例如Piggyback \ cite {Mallya2018piggyback}），通过仅以快速的方式学习二进制元素掩码来解决这些问题，同时保持骨干模型固定。但是，二进制面具的新任务建模能力有限。最新的工作\ Cite {Hung2019Compacting}提出了一种基于压缩的方法（CPG），通过部分训练骨干模型来实现新任务的更好准确性，但订购更高的培训成本使得不可避免地将其部署到流行的最新的最新的Art-Art-Art-Art-Art-Art-Art-Art-Art-Art-Edge/Mobly-Lecle-Learning中。这项工作的主要目标是在不断学习环境中同时实现快速和高准确的多任务改编。因此，我们提出了一种新的训练方法，称为\ textit {kernel-wise soft mask}（ksm），该方法在使用相同的骨干模型的同时，学习了每个任务的内核杂种二进制和真实价值软蒙版。这样的柔软面膜可以被视为二进制面膜的叠加和正确缩放的实价张量，它提供了更丰富的表示能力，而无需低级内核支持，以实现低硬件开销的目标。我们在多个基准数据集上验证了KSM，可针对最近的最新方法（例如Piggyback，Packnet，CPG等）验证KSM，这在准确性和培训成本方面都很好地提高了。

Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as \textit{catastrophic forgetting}. While recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets, some issues still remain to be tackled when applying them in real-world problems. Recently, the fast mask-based learning method (e.g. piggyback \cite{mallya2018piggyback}) is proposed to address these issues by learning only a binary element-wise mask in a fast manner, while keeping the backbone model fixed. However, the binary mask has limited modeling capacity for new tasks. A more recent work \cite{hung2019compacting} proposes a compress-grow-based method (CPG) to achieve better accuracy for new tasks by partially training backbone model, but with order-higher training cost, which makes it infeasible to be deployed into popular state-of-the-art edge-/mobile-learning. The primary goal of this work is to simultaneously achieve fast and high-accuracy multi task adaption in continual learning setting. Thus motivated, we propose a new training method called \textit{kernel-wise Soft Mask} (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task, while using the same backbone model. Such a soft mask can be viewed as a superposition of a binary mask and a properly scaled real-value tensor, which offers a richer representation capability without low-level kernel support to meet the objective of low hardware overhead. We validate KSM on multiple benchmark datasets against recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which shows good improvement in both accuracy and training cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题