自动回报的无监督图像细分

论文标题

自动回报的无监督图像细分

Autoregressive Unsupervised Image Segmentation

论文作者

Ouali, Yassine, Hudelot, Céline, Tami, Myriam

论文摘要

在这项工作中，我们基于输入的不同构造观点之间的相互信息最大化，提出了一种新的无监督图像分割方法。从自回旋生成模型中汲取灵感，这些模型可以通过蒙面的汇报创建的光栅扫描订购来预测过去像素的当前像素，我们建议使用各种形式的蒙版汇报对输入进行不同的订单，以构建数据的不同视图。对于给定的输入，该模型会产生一对具有两个有效订购的预测，然后进行训练以最大程度地提高两个输出之间的互信息。这些输出可以是表示学习的低维特征，也可以是与用于聚类的语义标签相对应的输出群集。虽然在训练期间使用了蒙版的卷积，但在推理中，没有掩盖掩盖，我们又回到了模型可以访问完整输入的标准卷积。所提出的方法在无监督的图像分割上优于当前最新方法。它简单易于实现，可以扩展到其他视觉任务，并无缝集成到需要数据不同视图的现有的无监督学习方法中。

In this work, we propose a new unsupervised image segmentation approach based on mutual information maximization between different constructed views of the inputs. Taking inspiration from autoregressive generative models that predict the current pixel from past pixels in a raster-scan ordering created with masked convolutions, we propose to use different orderings over the inputs using various forms of masked convolutions to construct different views of the data. For a given input, the model produces a pair of predictions with two valid orderings, and is then trained to maximize the mutual information between the two outputs. These outputs can either be low-dimensional features for representation learning or output clusters corresponding to semantic labels for clustering. While masked convolutions are used during training, in inference, no masking is applied and we fall back to the standard convolution where the model has access to the full input. The proposed method outperforms current state-of-the-art on unsupervised image segmentation. It is simple and easy to implement, and can be extended to other visual tasks and integrated seamlessly into existing unsupervised learning methods requiring different views of the data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题