论文标题
用基于插槽的变压器对时间抽象的无监督学习
Unsupervised Learning of Temporal Abstractions with Slot-based Transformers
论文作者
论文摘要
可重复使用的子装饰的发现简化了复杂的强化学习问题中的决策和计划。先前的方法建议通过观察从执行政策中收集的国家行动轨迹以纯粹无监督的方式学习这种时间抽象。但是,当前的限制是他们以完全顺序的方式处理每个轨迹,从而阻止他们根据新传入信息来修改有关亚行界边界点的早期决策。在这项工作中,我们提出了Slottar,这是一种完全平行的方法,它将序列处理变压器与插槽注意模块和自适应计算集成在一起,以学习以无监督的方式学习此类子例程的数量。我们证明了Slottar在边界点发现方面的表现如何超过强基础,即使是包含可变量亚行量的序列,同时更快地训练了现有基准测试的速度。
The discovery of reusable sub-routines simplifies decision-making and planning in complex reinforcement learning problems. Previous approaches propose to learn such temporal abstractions in a purely unsupervised fashion through observing state-action trajectories gathered from executing a policy. However, a current limitation is that they process each trajectory in an entirely sequential manner, which prevents them from revising earlier decisions about sub-routine boundary points in light of new incoming information. In this work we propose SloTTAr, a fully parallel approach that integrates sequence processing Transformers with a Slot Attention module and adaptive computation for learning about the number of such sub-routines in an unsupervised fashion. We demonstrate how SloTTAr is capable of outperforming strong baselines in terms of boundary point discovery, even for sequences containing variable amounts of sub-routines, while being up to 7x faster to train on existing benchmarks.