论文标题
从实时反馈中持续学习以跟随教学
Continual Learning for Instruction Following from Realtime Feedback
论文作者
论文摘要
我们提出并部署一种方法,以不断从用户在协作互动期间提供的反馈培训指导跟随代理。在互动过程中,人用户使用自然语言指导代理商,并在按照说明中观察代理时提供实时二进制反馈。我们设计了一种上下文强盗学习方法,将用户反馈转换为立即奖励。我们通过数千种人类代理的互动进行评估,证明了随着时间的推移,指导执行精度的绝对提高了15.4%。我们还表明,我们的方法对几种设计变化是可靠的,并且反馈信号大致相当于监督演示数据的学习信号。
We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to immediate reward. We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.