LitePth：在移动设备上进行快速准确的深度估计

论文标题

LitePth：在移动设备上进行快速准确的深度估计

LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices

论文作者

Li, Zhenyu, Chen, Zehui, Xu, Jialei, Liu, Xianming, Jiang, Junjun

论文摘要

单眼深度估计是计算机视觉社区的重要任务。尽管巨大的成功方法取得了出色的结果，但其中大多数在计算上都是昂贵的，并且不适用于实时推论。在本文中，我们旨在解决单眼深度估计的更实际的应用，该解决方案不仅应该考虑精度，而且还应考虑移动设备上的推论时间。为此，我们首先开发了一个基于端到端学习的模型，其重量大小（1.4MB）和短的推理时间（Raspberry Pi 4上的27fps）。然后，我们提出了一种简单而有效的数据增强策略，称为R2 CROP，以提高模型性能。此外，我们观察到，只有一个单个损失项训练的简单轻巧模型将遭受性能瓶颈的影响。为了减轻此问题，我们采用多个损失条款，在培训阶段提供足够的限制。此外，采用简单的动态重量重量策略，我们可以避免耗时的超参数选择损失项。最后，我们采用结构感知的蒸馏以进一步提高模型性能。值得注意的是，我们名为LitedEpth的解决方案在MAI＆AIM2022单眼估计挑战中排名第二，Si-RMSE为0.311，RMSE为3.79，推理时间为37 $ MS $，在Raspberry Pi 4中测试了37 $ MS $。值得注意的是，我们为挑战提供了最快的解决方案。代码和模型将以\ url {https://github.com/zhyever/litedepth}发布。

Monocular depth estimation is an essential task in the computer vision community. While tremendous successful methods have obtained excellent results, most of them are computationally expensive and not applicable for real-time on-device inference. In this paper, we aim to address more practical applications of monocular depth estimation, where the solution should consider not only the precision but also the inference time on mobile devices. To this end, we first develop an end-to-end learning-based model with a tiny weight size (1.4MB) and a short inference time (27FPS on Raspberry Pi 4). Then, we propose a simple yet effective data augmentation strategy, called R2 crop, to boost the model performance. Moreover, we observe that the simple lightweight model trained with only one single loss term will suffer from performance bottleneck. To alleviate this issue, we adopt multiple loss terms to provide sufficient constraints during the training stage. Furthermore, with a simple dynamic re-weight strategy, we can avoid the time-consuming hyper-parameter choice of loss terms. Finally, we adopt the structure-aware distillation to further improve the model performance. Notably, our solution named LiteDepth ranks 2nd in the MAI&AIM2022 Monocular Depth Estimation Challenge}, with a si-RMSE of 0.311, an RMSE of 3.79, and the inference time is 37$ms$ tested on the Raspberry Pi 4. Notably, we provide the fastest solution to the challenge. Codes and models will be released at \url{https://github.com/zhyever/LiteDepth}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题