After getting my testlabel and trainlabel, i implemented SVM on libsvm and i got an accuracy of 97.4359%. ( c= 1 and g = 0.00375)
model = svmtrain(TrainLabel, TrainVec, '-c 1 -g 0.00375');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);
After i find the best c and g,
bestcv = 0;
for log2c = -1:3,
for log2g = -4:1,
cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
cv = svmtrain(TrainLabel,TrainVec, cmd);
if (cv >= bestcv),
bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
end
fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
end
end
c = 8 and g = 0.125
I implement the model again:
model = svmtrain(TrainLabel, TrainVec, '-c 8 -g 0.125');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);
I get an accuracy of 82.0513%
How is it possible for the accuracy to decrease? shouldn't it increase? Or am i making any mistake?
Different model parameters affect the prediction accuracy of SVM model differently. Training sample size can also influence the prediction accuracy of SVM model. The method of determining the optimal SVM regression model is summarized. Prediction accuracy of SVM model improves greatly by applying the method promoted.
If your model's accuracy on your testing data is lower than your training or validation accuracy, it usually indicates that there are meaningful differences between the kind of data you trained the model on and the testing data you're providing for evaluation.
Accuracy can be computed by comparing actual test set values and predicted values. Well, you got a classification rate of 96.49%, considered as very good accuracy. For further evaluation, you can also check precision and recall of model.
The accuracies that you were getting during parameter tuning are biased upwards because you were predicting the same data that you were training. This is often fine for parameter tuning.
However, if you wanted those accuracies to be accurate estimates of the true generalization error on your final test set, then you have to add an additional wrap of cross validation or other resampling scheme.
Here is a very clear paper that outlines the general issue (but in a similar context of feature selection): http://www.pnas.org/content/99/10/6562.abstract
EDIT:
I usually add cross validation like:
n = 95 % total number of observations
nfold = 10 % desired number of folds
% Set up CV folds
inds = repmat(1:nfold, 1, mod(nfold, n))
inds = inds(randperm(n))
% Loop over folds
for i = 1:nfold
datapart = data(inds ~= i, :)
% do some stuff
% save results
end
% combine results
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With