论文标题
快速学习:将合成梯度纳入神经记忆控制器
Fast & Slow Learning: Incorporating Synthetic Gradients in Neural Memory Controllers
论文作者
论文摘要
与使用受限记忆的深层体系结构相比,近年来,神经记忆网络(NMN)受到了越来越多的关注。尽管他们有了新的吸引力,但NMN的成功取决于基于梯度的优化者对NMN控制器进行增量训练的能力,从而确定了如何利用其高能力进行知识检索。这意味着,虽然当训练数据一致且分布良好时,虽然可以实现出色的性能,但是由于控制器在模型培训期间无法有效地合并它们,因此很难从中学习罕见的数据样本。从人类的认知过程中汲取灵感,特别是人脑中神经调节剂的利用,我们建议将NMN控制器的学习过程解脱,以使他们能够在存在新信息的情况下实现灵活的,快速的适应性。该特征对于元学习任务非常有益,其中存储器控制器必须快速掌握目标域中的抽象概念并调整存储的知识。这使NMN控制器可以快速确定要保留哪些记忆并要删除哪些记忆,并迅速将其策略适应到手头的新任务中。通过对多个公共基准(包括分类和回归任务)的定量和定性评估,我们证明了拟议方法的实用性。我们的评估不仅强调了所提出的NMN体系结构优于当前最新方法的能力,而且还提供了有关建议的增强如何帮助取得如此出色结果的见解。此外,我们证明了提出的学习策略的实际含义,在该策略中可以在多个神经记忆网络之间共享反馈路径,以作为知识共享的机制。
Neural Memory Networks (NMNs) have received increased attention in recent years compared to deep architectures that use a constrained memory. Despite their new appeal, the success of NMNs hinges on the ability of the gradient-based optimiser to perform incremental training of the NMN controllers, determining how to leverage their high capacity for knowledge retrieval. This means that while excellent performance can be achieved when the training data is consistent and well distributed, rare data samples are hard to learn from as the controllers fail to incorporate them effectively during model training. Drawing inspiration from the human cognition process, in particular the utilisation of neuromodulators in the human brain, we propose to decouple the learning process of the NMN controllers to allow them to achieve flexible, rapid adaptation in the presence of new information. This trait is highly beneficial for meta-learning tasks where the memory controllers must quickly grasp abstract concepts in the target domain, and adapt stored knowledge. This allows the NMN controllers to quickly determine which memories are to be retained and which are to be erased, and swiftly adapt their strategy to the new task at hand. Through both quantitative and qualitative evaluations on multiple public benchmarks, including classification and regression tasks, we demonstrate the utility of the proposed approach. Our evaluations not only highlight the ability of the proposed NMN architecture to outperform the current state-of-the-art methods, but also provide insights on how the proposed augmentations help achieve such superior results. In addition, we demonstrate the practical implications of the proposed learning strategy, where the feedback path can be shared among multiple neural memory networks as a mechanism for knowledge sharing.