学习在线包装一般3D形状的物理可实现的技能

论文标题

学习在线包装一般3D形状的物理可实现的技能

Learning Physically Realizable Skills for Online Packing of General 3D Shapes

论文作者

Zhao, Hang, Pan, Zherong, Yu, Yang, Xu, Kai

论文摘要

我们研究了不规则3D形状的在线包装技能的问题，这可以说是垃圾箱包装问题最具挑战性的环境。目标是连续地将具有任意形状的3D对象的序列移至指定的容器中，仅部分观察对象序列。同时，我们考虑了物理可靠性，涉及物理动力和放置的约束。包装策略应了解要包装物体的3D几何形状，并做出有效的决定以以物理上可实现的方式以容器的方式容纳它。我们建议使用加强学习（RL）管道来学习政策。复杂的不规则几何形状和不完美的对象放置在一起，导致巨大的解决方案空间。在此类空间中的直接培训是严格的数据密集型。相反，我们提出了一种理论上提供的方法，以减少RL和学习负担的动作空间。然后，学会了一个参数化的策略，以从候选人中选择最佳位置。配备有效的异步RL加速度方法和模拟训练序列的数据准备过程，可以在48小时内在基于物理的环境中培训成熟的包装策略。通过对各种现实形状的数据集的广泛评估以及与最先进的基线的比较，我们证明，在包装公用事业方面，我们的方法在所有数据集上的表现都比所有数据集的表现最佳的基线至少高12.8％。

We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems. The goal is to consecutively move a sequence of 3D objects with arbitrary shapes into a designated container with only partial observations of the object sequence. Meanwhile, we take physical realizability into account, involving physics dynamics and constraints of a placement. The packing policy should understand the 3D geometry of the object to be packed and make effective decisions to accommodate it in the container in a physically realizable way. We propose a Reinforcement Learning (RL) pipeline to learn the policy. The complex irregular geometry and imperfect object placement together lead to huge solution space. Direct training in such space is prohibitively data intensive. We instead propose a theoretically-provable method for candidate action generation to reduce the action space of RL and the learning burden. A parameterized policy is then learned to select the best placement from the candidates. Equipped with an efficient method of asynchronous RL acceleration and a data preparation process of simulation-ready training sequences, a mature packing policy can be trained in a physics-based environment within 48 hours. Through extensive evaluation on a variety of real-life shape datasets and comparisons with state-of-the-art baselines, we demonstrate that our method outperforms the best-performing baseline on all datasets by at least 12.8% in terms of packing utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题