论文标题
智能眼镜的实用立体声深度系统
A Practical Stereo Depth System for Smart Glasses
论文作者
论文摘要
我们介绍了生产的端到端立体声深度传感系统的设计,该系统确实进行了预处理,在线立体声整流和立体声深度估计,当整流不可靠时,并缩回了单眼深度估计。然后,在新颖的视图生成管道中使用我们的深度传感系统的输出,以使用智能眼镜捕获的视点图像创建3D计算摄影效果。所有这些步骤均在手机的严格计算预算上执行,并且由于我们希望用户可以使用各种智能手机,因此我们的设计需要一般,并且不能依赖于特定的硬件或ML加速器,例如智能手机GPU。尽管对这些步骤进行了充分的研究,但仍缺乏对实用系统的描述。对于这样的系统,所有这些步骤都需要相互协同工作,并在系统内的故障或不超过理想的输入数据方面优雅地退缩。我们展示了如何处理无法预料的校准变化,例如,由于热量,在野外的强大支持深度估计,并且仍然遵守流畅的用户体验所需的内存和潜伏期约束。我们证明我们的训练有素的模型很快,并且在六岁的三星Galaxy S8 Phone的CPU上运行不到1秒。我们的模型可以很好地推广到看不见的数据并在米德伯里(Middlebury)和智能眼镜捕获的野外图像上取得良好的成果。
We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. The output of our depth sensing system is then used in a novel view generation pipeline to create 3D computational photography effects using point-of-view images captured by smart glasses. All these steps are executed on-device on the stringent compute budget of a mobile phone, and because we expect the users can use a wide range of smartphones, our design needs to be general and cannot be dependent on a particular hardware or ML accelerator such as a smartphone GPU. Although each of these steps is well studied, a description of a practical system is still lacking. For such a system, all these steps need to work in tandem with one another and fallback gracefully on failures within the system or less than ideal input data. We show how we handle unforeseen changes to calibration, e.g., due to heat, robustly support depth estimation in the wild, and still abide by the memory and latency constraints required for a smooth user experience. We show that our trained models are fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU. Our models generalize well to unseen data and achieve good results on Middlebury and in-the-wild images captured from the smart glasses.