产生看不见的复杂场景：我们到了吗？

论文标题

产生看不见的复杂场景：我们到了吗？

Generating unseen complex scenes: are we there yet?

论文作者

Casanova, Arantxa, Drozdzal, Michal, Romero-Soriano, Adriana

论文摘要

尽管最近的复杂场景有条件生成模型产生了越来越吸引人的场景，但很难评估哪些模型的性能更好以及原因。这通常是由于训练模型以适合不同的数据拆分并定义自己的实验设置。在本文中，我们提出了一种比较复杂场景有条件生成模型的方法，并提供了深入的分析，该方法评估了每个模型的能力（1）拟合训练分布，因此在可见条件上表现良好，（2）将其推广到由可见对象组合组成的未见条件，以及（3）概括地构成了不可见证的对象组合对象组合对象组合。结果，我们观察到，最近的方法能够生成具有可见条件的可识别场景，并利用组成性将其推广到具有可见对象组合的看不见的条件。但是，当被要求从由看不见的对象组合组成的条件中生成图像时，所有方法都均均具有明显的图像质量降解。此外，通过我们的分析，我们确定不同管道组件的优势，并发现（1）通过实例来鼓励组成性，通过实例化的空间调节正常化增加了对两种不看到的条件的鲁棒性，（2）使用语义上意识到的损失，使用类型的感知性相似性，例如，诸如场景的相似性（例如，诸如现场相似之处）有助于提高质量的质量和（3）的质量，并（3）增强质量的质量（3），（3）质量的质量（3），（3）质量的质量（3）提高对两种看不见条件的鲁棒性。

Although recent complex scene conditional generation models generate increasingly appealing scenes, it is very hard to assess which models perform better and why. This is often due to models being trained to fit different data splits, and defining their own experimental setups. In this paper, we propose a methodology to compare complex scene conditional generation models, and provide an in-depth analysis that assesses the ability of each model to (1) fit the training distribution and hence perform well on seen conditionings, (2) to generalize to unseen conditionings composed of seen object combinations, and (3) generalize to unseen conditionings composed of unseen object combinations. As a result, we observe that recent methods are able to generate recognizable scenes given seen conditionings, and exploit compositionality to generalize to unseen conditionings with seen object combinations. However, all methods suffer from noticeable image quality degradation when asked to generate images from conditionings composed of unseen object combinations. Moreover, through our analysis, we identify the advantages of different pipeline components, and find that (1) encouraging compositionality through instance-wise spatial conditioning normalizations increases robustness to both types of unseen conditionings, (2) using semantically aware losses such as the scene-graph perceptual similarity helps improve some dimensions of the generation process, and (3) enhancing the quality of generated masks and the quality of the individual objects are crucial steps to improve robustness to both types of unseen conditionings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题