论文标题
形状和观点无关键
Shape and Viewpoint without Keypoints
论文作者
论文摘要
我们提出了一个学习框架,该框架学会从单个图像中恢复3D形状,姿势和纹理,在图像集合中训练,没有任何地面真相3D形状,多视图,相机观点或关键点监督。我们在“通过合成”框架的“分析”框架中解决了这个高度不受约束的问题,在该框架中,目标是预测可能使用各种学识渊博的特定类别的先验产生图像的可能形状,纹理和相机观点。我们在本文中的特殊贡献是表示相机分布的代表,我们称之为“摄像机 - 媒介”。我们没有选择点估计值,而是维护一组相机假设,这些假设在训练过程中进行了优化,以最好地解释鉴于当前形状和纹理的图像。我们称我们的方法无监督类别的网格重建(U-CMR),并在Cub,Pascal 3D和新的Web绑定数据集上呈现定性和定量结果。我们获得了最新的摄像头预测结果,并表明我们可以使用图像集合或3D地面真实的图像收集来预测对象之间各种形状和纹理。项目页面:https://shubham-goel.github.io/ucmr
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We approach this highly under-constrained problem in a "analysis by synthesis" framework where the goal is to predict the likely shape, texture and camera viewpoint that could produce the image with various learned category-specific priors. Our particular contribution in this paper is a representation of the distribution over cameras, which we call "camera-multiplex". Instead of picking a point estimate, we maintain a set of camera hypotheses that are optimized during training to best explain the image given the current shape and texture. We call our approach Unsupervised Category-Specific Mesh Reconstruction (U-CMR), and present qualitative and quantitative results on CUB, Pascal 3D and new web-scraped datasets. We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects using an image collection without any keypoint annotations or 3D ground truth. Project page: https://shubham-goel.github.io/ucmr