FINGAN：用于银行和保险分析客户关系管理的生成对抗网络

论文标题

FINGAN：用于银行和保险分析客户关系管理的生成对抗网络

FinGAN: Generative Adversarial Network for Analytical Customer Relationship Management in Banking and Insurance

论文作者

Kate, Prateek, Ravi, Vadlamani, Gangwar, Akhilesh

论文摘要

信用卡中的搅动预测，保险中的欺诈检测以及贷款默认预测是重要的分析客户关系管理（ACRM）问题。由于欺诈，搅动和默认发生频率较低，因此这些问题的数据集自然是高度不平衡的。因此，在此类不平衡数据集中接受培训时，所有监督的机器学习分类器都倾向于产生大量的假阳性率。我们提出了两种数据平衡方法。首先，我们提出了一种使用生成对抗网络（GAN）生成少数族裔类的合成样本的过采样方法。我们采用Vanilla Gan [1]，Wasserstein Gan [2]和Ctgan [3]分别超过了少数群体样本。为了评估我们提出的方法的疗效，我们使用了许多机器学习分类器，包括随机森林，决策树，支持向量机（SVM）和logistic回归对GAN平衡的数据。在第二种方法中，我们引入了一种处理数据不平衡的混合方法。在第二种方面，我们通过增加由GAN的合成少数群体数据来利用不足的少数群体数据，并使用一流的支持者（OCSVM）获得的综合少数类别数据来利用不足和过度采样的力量[4]。我们将GAN生成的过度采样数据和OCSVM [4]采样不足的数据结合在一起，并将结果数据传递给分类器。当我们将我们的结果与Farquad等人的结果进行比较时。 [5]，Sundarkumar，Ravi和Siddeshwar [6]，我们提出的方法在所有数据集上的ROC曲线（AUC）下的面积上优于先前的结果。

Churn prediction in credit cards, fraud detection in insurance, and loan default prediction are important analytical customer relationship management (ACRM) problems. Since frauds, churns and defaults happen less frequently, the datasets for these problems turn out to be naturally highly unbalanced. Consequently, all supervised machine learning classifiers tend to yield substantial false-positive rates when trained on such unbalanced datasets. We propose two ways of data balancing. In the first, we propose an oversampling method to generate synthetic samples of minority class using Generative Adversarial Network (GAN). We employ Vanilla GAN [1], Wasserstein GAN [2] and CTGAN [3] separately to oversample the minority class samples. In order to assess the efficacy of our proposed approach, we use a host of machine learning classifiers, including Random Forest, Decision Tree, support vector machine (SVM), and Logistic Regression on the data balanced by GANs. In the second method, we introduce a hybrid method to handle data imbalance. In this second way, we utilize the power of undersampling and over-sampling together by augmenting the synthetic minority class data oversampled by GAN with the undersampled majority class data obtained by one-class support vigor machine (OCSVM) [4]. We combine both over-sampled data generated by GAN and the data under-sampled by OCSVM [4] and pass the resultant data to classifiers. When we compared our results to those of Farquad et al. [5], Sundarkumar, Ravi, and Siddeshwar [6], our proposed methods outperform the previous results in terms of the area under the ROC curve (AUC) on all datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题