The challenges of the classification for the largescale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However,
it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem.
Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the
training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered
to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is
used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result,
the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
Comentarios