stylegans中潜在空间的分层语义正规化

论文标题

stylegans中潜在空间的分层语义正规化

Hierarchical Semantic Regularization of Latent Spaces in StyleGANs

论文作者

Karmali, Tejan, Parihar, Rishubh, Agrawal, Susmit, Rangwani, Harsh, Jampani, Varun, Singh, Maneesh, Babu, R. Venkatesh

论文摘要

GAN的进展使高分辨率的感性质量的影像产生了。 stylegans允许通过数学操作对W/W+空间中的潜在样式向量上的数学操作进行引人入胜的属性修改，从而有效地调节了生成器的丰富层次结构表示。最近，此类操作已被推广到原始Stylegan纸中的属性交换之外，以包括插值。尽管StyleGans有许多重大改进，但仍可以看到它们会产生不自然的图像。生成的图像的质量基于两个假设。（a）生成器学到的层次表示的丰富性，以及（b）样式空间的线性和平滑性。在这项工作中，我们提出了一个层次的语义正常器（HSR），该层次正常化器（HSR）与生成器学到的层次结构表示与大量数据验证的网络学到的相应强大功能。 HSR不仅可以改善发电机的表示，还可以改善潜在风格空间的线性和平滑度，从而导致产生更自然的样式编辑的图像。为了证明线性改善，我们提出了一种新型的度量 - 属性线性评分（ALS）。通过改善感知路径长度（PPL）度量的改善，在不同的标准数据集中平均16.19％的不自然图像的生成显着降低，同时改善了属性变化在属性编辑任务中的线性变化。

Progress in GANs has enabled the generation of high-resolution photorealistic images of astonishing quality. StyleGANs allow for compelling attribute modification on such images via mathematical operations on the latent style vectors in the W/W+ space that effectively modulate the rich hierarchical representations of the generator. Such operations have recently been generalized beyond mere attribute swapping in the original StyleGAN paper to include interpolations. In spite of many significant improvements in StyleGANs, they are still seen to generate unnatural images. The quality of the generated images is predicated on two assumptions; (a) The richness of the hierarchical representations learnt by the generator, and, (b) The linearity and smoothness of the style spaces. In this work, we propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data. HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images. To demonstrate improved linearity, we propose a novel metric - Attribute Linearity Score (ALS). A significant reduction in the generation of unnatural images is corroborated by improvement in the Perceptual Path Length (PPL) metric by 16.19% averaged across different standard datasets while simultaneously improving the linearity of attribute-change in the attribute editing tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题