论文标题

RCC-GAN:用于大规模表格数据合成的正则化合物条件GAN

RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis

论文作者

Esmaeilpour, Mohammad, Chaalia, Nourhene, Abusitta, Adel, Devailly, Francois-Xavier, Maazoun, Wissem, Cardinal, Patrick

论文摘要

本文介绍了一种新颖的生成对抗网络(GAN),用于合成大规模表格数据库,其中包含各种特征,例如连续,离散和二进制。从技术上讲,我们的gan属于具有预定义条件矢量的类调节生成模型的类别。但是,我们提出了一种新的公式,用于同时得出这种矢量同时结合二进制和离散功能。我们将此崇高的定义称为复合条件向量,并将其用于训练发电机网络。该网络的核心架构是带有跳过连接的三层深度神经网络。为了提高这种复杂体系结构的稳定性,我们提出了一种正规化计划,以限制训练期间其重量向量的前所未有的变化。这种正则化方法与对抗训练的性质非常兼容,并且在运行时没有计算上的过度。此外,我们不断监测权重向量的变化,以识别任何潜在的不稳定性或违规性,以衡量我们提出的正规器的强度。为此,我们还开发了一种新的指标,用于使用单数值分解理论跟踪重量向量的突然扰动。最后,我们评估了我们提出的合成方法在六个基准标准数据库中的性能,即成人,人口普查,HCDR,CABS,News和King。实现的结果证实了在大多数情况下,我们提议的Rccgan就准确性,稳定性和可靠性而优于其他常规和现代生成模型。

This paper introduces a novel generative adversarial network (GAN) for synthesizing large-scale tabular databases which contain various features such as continuous, discrete, and binary. Technically, our GAN belongs to the category of class-conditioned generative models with a predefined conditional vector. However, we propose a new formulation for deriving such a vector incorporating both binary and discrete features simultaneously. We refer to this noble definition as compound conditional vector and employ it for training the generator network. The core architecture of this network is a three-layered deep residual neural network with skip connections. For improving the stability of such complex architecture, we present a regularization scheme towards limiting unprecedented variations on its weight vectors during training. This regularization approach is quite compatible with the nature of adversarial training and it is not computationally prohibitive in runtime. Furthermore, we constantly monitor the variation of the weight vectors for identifying any potential instabilities or irregularities to measure the strength of our proposed regularizer. Toward this end, we also develop a new metric for tracking sudden perturbation on the weight vectors using the singular value decomposition theory. Finally, we evaluate the performance of our proposed synthesis approach on six benchmarking tabular databases, namely Adult, Census, HCDR, Cabs, News, and King. The achieved results corroborate that for the majority of the cases, our proposed RccGAN outperforms other conventional and modern generative models in terms of accuracy, stability, and reliability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源