论文标题
使用高斯内核和数据不平衡数据的支持向量机分类器的行为分析
Behavioral analysis of support vector machine classifier with Gaussian kernel and imbalanced data
论文作者
论文摘要
支持向量机(SVM)的参数(例如惩罚参数和内核参数)对分类精度和SVM模型的复杂性具有很大的影响。因此,SVM中的模型选择涉及这些参数的调整。但是,这些参数通常被调整并用作黑匣子,而不了解数学背景或内部细节。在本文中,当这些参数以平衡和不平衡的数据为不同的值时,分析了SVM分类模型的行为。该分析包括可视化,数学和几何解释以及说明性的数值示例,目的是提供高斯和线性内核函数的基础知识。通过此分析,我们提出了一种新颖的搜索算法。在此算法中,我们将最佳SVM参数搜索到两个一维空间中,而不是搜索到一个二维空间中。这大大减少了计算时间。此外,在我们的算法中,从数据分析中,可以预期内核函数的范围。这也减少了搜索空间,因此减少了所需的计算时间。进行了不同的实验,以使用不同的平衡和不平衡的数据集评估我们的搜索算法。结果表明,与其他搜索策略相比,提出的策略如何快速有效。
The parameters of support vector machines (SVMs) such as the penalty parameter and the kernel parameters have a great impact on the classification accuracy and the complexity of the SVM model. Therefore, the model selection in SVM involves the tuning of these parameters. However, these parameters are usually tuned and used as a black box, without understanding the mathematical background or internal details. In this paper, the behavior of the SVM classification model is analyzed when these parameters take different values with balanced and imbalanced data. This analysis including visualization, mathematical and geometrical interpretations and illustrative numerical examples with the aim of providing the basics of the Gaussian and linear kernel functions with SVM. From this analysis, we proposed a novel search algorithm. In this algorithm, we search for the optimal SVM parameters into two one-dimensional spaces instead of searching into one two-dimensional space. This reduces the computational time significantly. Moreover, in our algorithm, from the analysis of the data, the range of kernel function can be expected. This also reduces the search space and hence reduces the required computational time. Different experiments were conducted to evaluate our search algorithm using different balanced and imbalanced datasets. The results demonstrated how the proposed strategy is fast and effective than other searching strategies.