论文标题
语义自动编码器及其对对抗攻击的潜在用途
Semantic Autoencoder and Its Potential Usage for Adversarial Attack
论文作者
论文摘要
自动编码器可以引起输入数据的适当潜在表示,但是,仅基于输入数据的内在属性的表示形式通常不如表达某些语义信息。一个典型的情况是在这些表示形式的聚类中形成明确边界的潜在无能。通过编码潜在表示不仅取决于输入数据的内容,还取决于输入数据的语义(例如标签信息),我们提出了一个增强的自动编码器体系结构,名为Sminantic AutoCencododer。通过T-SNE进行表示分布的实验显示了这两种类型的编码器之间的明显区别,并确认了语义上的至高无上,而这两种类型的自动编码器的解码样品在客观或主观上表现出微弱的差异。基于此观察结果,我们考虑对依赖于通过自动编码器获得的潜在表示的学习算法的对抗性攻击。事实证明,与原始输入数据相比,用刻意的错误标签信息构建的对抗性样品的潜在含量显示出不同的分布,而这两个样品都表现出很小的差异。由于有必要确保广泛的深度学习应用程序,因此我们工作设定的这种新的攻击方式值得关注。
Autoencoder can give rise to an appropriate latent representation of the input data, however, the representation which is solely based on the intrinsic property of the input data, is usually inferior to express some semantic information. A typical case is the potential incapability of forming a clear boundary upon clustering of these representations. By encoding the latent representation that not only depends on the content of the input data, but also the semantic of the input data, such as label information, we propose an enhanced autoencoder architecture named semantic autoencoder. Experiments of representation distribution via t-SNE shows a clear distinction between these two types of encoders and confirm the supremacy of the semantic one, whilst the decoded samples of these two types of autoencoders exhibit faint dissimilarity either objectively or subjectively. Based on this observation, we consider adversarial attacks to learning algorithms that rely on the latent representation obtained via autoencoders. It turns out that latent contents of adversarial samples constructed from semantic encoder with deliberate wrong label information exhibit different distribution compared with that of the original input data, while both of these samples manifest very marginal difference. This new way of attack set up by our work is worthy of attention due to the necessity to secure the widespread deep learning applications.