论文标题

一个学习柔性奖励行为的计算理论

A Computational Theory of Learning Flexible Reward-Seeking Behavior with Place Cells

论文作者

Gao, Yuanxiang

论文摘要

计算神经科学中的一个重要的开放问题是,如何使用各种空间调节的神经元(例如位置细胞)来支持动物的奖励奖励行为。当环境发生变化时,现有的计算模型要么缺乏生物学上的合理性,要么缺乏行为灵活性。在本文中,我们提出了一种计算理论,该理论可实现行为灵活性,具有更好的生物学合理性。我们首先训练高斯分布的混合物,以建模位置细胞的放电场的集合。然后,我们提出了一个类似Hebbian的规则,以学习位置细胞之间的突触强度基质。该矩阵被解释为连续时间马尔可夫链的过渡速率矩阵,以生成位置细胞的顺序重播。在重播期间,通过像暂时的差异这样的规则来学习从位置细胞到中刺神经元(MSN)的突触强度,以存储位置奖励关联。重播后,当动物接近奖励的地方时,MSN的激活将增加,因此动物可以沿着MSN激活增加以找到有益的位置的方向移动。我们将理论实施到Mujoco物理模拟器中的高保真虚拟大鼠中。在复杂的迷宫中,与实现神经科学启发的增强学习算法的大鼠相比,大鼠的学习效率和行为灵活性明显好得多。

An important open question in computational neuroscience is how various spatially tuned neurons, such as place cells, are used to support the learning of reward-seeking behavior of an animal. Existing computational models either lack biological plausibility or fall short of behavioral flexibility when environments change. In this paper, we propose a computational theory that achieves behavioral flexibility with better biological plausibility. We first train a mixture of Gaussian distributions to model the ensemble of firing fields of place cells. Then we propose a Hebbian-like rule to learn the synaptic strength matrix among place cells. This matrix is interpreted as the transition rate matrix of a continuous time Markov chain to generate the sequential replay of place cells. During replay, the synaptic strengths from place cells to medium spiny neurons (MSN) are learned by a temporal-difference like rule to store place-reward associations. After replay, the activation of MSN will ramp up when an animal approaches the rewarding place, so the animal can move along the direction where the MSN activation is increasing to find the rewarding place. We implement our theory into a high-fidelity virtual rat in the MuJoCo physics simulator. In a complex maze, the rat shows significantly better learning efficiency and behavioral flexibility than a rat that implements a neuroscience-inspired reinforcement learning algorithm, deep Q-network.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源