论文标题
使用贝叶斯非参数词典的持续学习
Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors
论文作者
论文摘要
训练有素的神经网络倾向于在连续任务设置中体验到灾难性的遗忘,在此设置中,从以前的任务中的数据不可用。最近已经提出了许多使用各种模型扩展策略的方法。但是,确定要扩展模型的数量留给了从业者,并且通常会为简单选择持续的时间表,而不管传入任务的复杂程度如何。取而代之的是,我们根据印度自助餐过程(IBP)提出了一种原则性的贝叶斯非参数方法,让数据确定要扩大模型复杂性的程度。我们将其与神经网络重量矩阵的分解相结合。这种方法允许每个重量矩阵的一系列因素随着任务的复杂性而扩展,而IBP先验则鼓励重量因子选择稀少,并重复使用因子,从而促进了任务之间的积极知识传递。我们证明了我们的方法对许多持续学习基准的有效性,并分析了整个培训期间如何分配和重新使用体重因子。
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity, regardless of how complex the incoming task is. Instead, we propose a principled Bayesian nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity. We pair this with a factorization of the neural network's weight matrices. Such an approach allows the number of factors of each weight matrix to scale with the complexity of the task, while the IBP prior encourages sparse weight factor selection and factor reuse, promoting positive knowledge transfer between tasks. We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.