论文标题
稀疏代码提高了具有相关输入的最佳控制任务的神经动态编程的速度和效率
A sparse code increases the speed and efficiency of neuro-dynamic programming for optimal control tasks with correlated inputs
论文作者
论文摘要
已经提出,神经科学中的稀疏代码比其他感觉数据的其他神经表示具有某些计算优势。为了探讨这个观点,使用稀疏代码来表示使用神经动态编程解决的最佳控制任务中的自然图像,并研究了其计算属性。核心发现是,当将线性网络的功能输入相关联时,过度完整的稀疏代码会以有效的方式增加网络的内存能力,超出了任何具有相同大小的输入的完整代码,并增加了学习网络权重的速度。找到一个完整的稀疏代码,可以通过将其功能输入去将线性网络的内存能力最大化,以将最小二乘问题的设计矩阵转换为完整等级之一。它还调节了最小二乘问题的Hessian矩阵,从而增加了收敛速度到最佳网络权重。其他类型的非相关代码也将实现这一目标。但是,发现一个过度完整的稀疏代码近似相关,从同一输入中提取较大数量的近似脱离相关的功能,从而使其有效地增加内存能力超出任何完整代码的可能性:与使用同一输入相比,至少显示了2.25倍过度完整的稀疏代码,至少显示出至少具有双重记忆能力。这是在顺序学习中使用的,用于在网络中存储潜在的最佳控制任务,而使用分区表示,避免了灾难性的遗忘,从而产生了一个成本到GO的功能近似器,该函数近似值均在每个分区中概括了。还讨论了比密集代码和本地代码的稀疏代码优势。
Sparse codes in neuroscience have been suggested to offer certain computational advantages over other neural representations of sensory data. To explore this viewpoint, a sparse code is used to represent natural images in an optimal control task solved with neuro-dynamic programming, and its computational properties are investigated. The central finding is that when feature inputs to a linear network are correlated, an over-complete sparse code increases the memory capacity of the network in an efficient manner beyond that possible for any complete code with the same-sized input, and also increases the speed of learning the network weights. A complete sparse code is found to maximise the memory capacity of a linear network by decorrelating its feature inputs to transform the design matrix of the least-squares problem to one of full rank. It also conditions the Hessian matrix of the least-squares problem, thereby increasing the rate of convergence to the optimal network weights. Other types of decorrelating codes would also achieve this. However, an over-complete sparse code is found to be approximately decorrelated, extracting a larger number of approximately decorrelated features from the same-sized input, allowing it to efficiently increase memory capacity beyond that possible for any complete code: a 2.25 times over-complete sparse code is shown to at least double memory capacity compared with a complete sparse code using the same input. This is used in sequential learning to store a potentially large number of optimal control tasks in the network, while catastrophic forgetting is avoided using a partitioned representation, yielding a cost-to-go function approximator that generalizes over the states in each partition. Sparse code advantages over dense codes and local codes are also discussed.