libsvm常用参数和使用误区

文章由LinuxBoy分享于2019-03-24 07:03:25热评（631）

libsvm常用参数和使用误区

1 常用参数

svm-train training_set_file model_file

svm-predict test_file model_file output_file

自动脚本：python easy.py train_data test_data

自动选择最优参数，自动进行归一化。

对训练集合和测试结合，使用同一个归一化参数。

-c：参数

-g: 参数

-v：交叉验证数

-s svm_type : set type of SVM (default 0)

0 -- C-SVC

1 -- nu-SVC

2 -- one-class SVM

3 -- epsilon-SVR

4 -- nu-SVR

-t kernel_type : set type of kernel function (default 2)

0 -- linear: u'*v

1 -- polynomial: (gamma*u'*v + coef0)^degree

2 -- radial basis function: exp(-gamma*|u-v|^2)

3 -- sigmoid: tanh(gamma*u'*v + coef0)

-d degree : set degree in kernel function (default 3)

-g gamma : set gamma in kernel function (default 1/num_features)

-r coef0 : set coef0 in kernel function (default 0)

-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)

-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)

-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)

-m cachesize : set cache memory size in MB (default 100)

-e epsilon : set tolerance of termination criterion (default 0.001)

-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)

-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)

-wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)

The k in the -g option means the number of attributes in the input data.

2 libsvm使用误区

(1) 直接将训练集合和测试集合简单归一化到[0,1]区间，可能导致实验结果很差。

(2) 如果样本的特征数非常多，那么就不必使用RBF核将样本映射到高维空间。

a) 在特征数非常多的情况下，使用线性核，结果已经非常好，并且只需要选择参数C即可。

b) 虽然说RBF核的结果至少比线性核好，前提下搜索整个的空间。

(3) 样本数<<特征数的情况：

a) 推荐使用线性核，可以达到与RBF同样的性能。

(4) 样本数和特征数都非常多：推荐使用liblinear，更少的时间和内存，可比的准确率。

(5) 样本数>>特征数：如果想使用线性模型，可以使用liblinear，并且使用-s 2参数。

推荐文章：

libsvm常用参数和使用误区