I am using libsvm and I did a very simple experiment, training 10k vectors and testing with only 22. I am using the linear kernel with parameter cost C=1
. My problem is multiclass. So Libsvm will use the one-versus-one approach to classify my data.
Libsvm uses SMO to find the separating hyperplane.
A friend of mine did the same experiment, but the SVM classifier used was from the Statistics Toolbox. He also used the e1071 package from R. Again, the kernel used was the linear kernel, the parameter cost C
is equal to one, and the one-versus-one approach is used to classify the data in MATLAB (the one-versus-one approach was coded by my friend) and e1071 R package. Both the MATLAB Statistics Toolbox and e1071 from R use the SMO approach for finding the separating hyperplane by default.
I also tried the latest LIBLINEAR library. Again, the same configuration was used.
Here are the codes used:
./svm-scale -s train.range train.libsvm > train.scale
./svm-scale -r train.range test.libsvm > test.scale
./svm-train -t 0 -c 1 train.scale train.model
./svm-predict test.scale train.model test.predict
./svm-scale -s train.range train.libsvm > train.scale
./svm-scale -r train.range test.libsvm > test.scale
./train train.scale train.model
./predict test.scale train.model test.predict
rm(list = ls())
cat("\014")
library(e1071)
cat("Training model\n")
Traindata = read.csv("train.csv", header=FALSE)
SVM_model = svm(Traindata[,2:ncol(Traindata)], Traindata[,1], kernel="linear", tolerance=0.1, type="C-classification")
print(SVM_model)
cat("Testing model\n")
Testdata = read.csv("test.csv", header=FALSE)
Preddata = predict(SVM_model, Testdata[,2:ncol(Testdata)])
ConfMat = table(pred=Preddata, true=Testdata[,1])
print(ConfMat)
accuracy = 0
for (i in 1 : nrow(ConfMat)) {
for (j in 1 : ncol(ConfMat)) {
if (i == j) {
accuracy = accuracy + ConfMat[i, i]
}
}
}
accuracy = (accuracy / sum(ConfMat)) * 100
cat("Test vectors:", dim(Testdata), ", Accuracy =", accuracy, "%\n")
There are some accuracies differences:
So why are the predictions different? I mean, if all the SVMs use linear kernels, with the same cost parameter and using the same approach for multiclass classification, shouldn't the result be the same?
First let me address the R solution; From what I understand, the e1071 package is simply a wrapper around the libsvm library. Therefore assuming you use the same settings and steps in both, you should be getting the same results.
I'm not a regular R user myself, but from I can tell you are not performing data normalization in the R code (in order to scale the features into the [-1,1]
range). As we know SVMs are not scale invariant, so this omission should explain the difference from the other results.
MATLAB has its own implementations in svmtrain and fitcsvm. It only supports binary classification, so you will have to manually handle multi-class problems (see here for an example).
The documentation explains that it uses the standard SMO algorithm (actually one of three possible algorithms offered to solve the quadratic-programming optimization problem). The docs lists a couple of books and papers at the bottom as references. In principle you should get similar predictions as libsvm (assuming you replicate the parameters used and apply the same kind of preprocessing to the data).
Now as for libsvm vs. liblinear, you should know that the implementations differ a little bit in the formulation of the objective function:
libsvm solves the following dual problem:
On the other hand, the dual form of liblinear with a L2-regularized L1-loss SVC solver is:
... not to mention that the algorithms are coded with different goals in mind: libsvm is written in a way to allow switching between different kernel functions, while liblinear is optimized to always be linear and have no concept of kernels at all. Which is why libsvm is not easily applicable to large scale problems (even with a linear kernel), and often it is suggested to use liblinear when you have large number of instances.
Furthermore regarding multi-class problems with k
classes, libsvm by default implements the one-against-one approach by constructing k*(k-1)/2
binary classifiers, while liblinear implements the one-vs-the-rest strategy by constructing k
binary classifiers (it also has an alternative method by Crammer and Singer for handling multi-class problems). I have previously shown how to perform one-vs-rest classification using libsvm (see here and here).
You also have to make sure to match the parameters passed to each (as close as possible):
svm-train.exe -s 0 -t 0
L2R_L1LOSS_DUAL
by calling train.exe -s 3
(dual form of L2-regularized L1-loss support vector classifier)-c 1
for both training functions-e
parameter differs between the two libraries, with e=0.001
for libsvm and e=0.1
for liblinear)train.exe -B 1
). Even then, I'm not certain you will get the exact same results in both, but predictions should be close enough...
Other considerations include how the libraries handle categorical features. For instance I know that libsvm converts a categorical feature with m
possible values into a m
numeric 0-1 features encoded as binary indicator attributes (i.e only one of them is one, rest is zeros). I'm not sure what liblinear does with discrete features.
Another issue is whether a particular implementation is deterministic and always returns the same results when repeated on the same data using the same settings. I've read somewhere that liblinear internally generates random-numbers during its work, but please dont take my word for it without actually checking the source code :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With