I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option. But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g 85%). There is no model and i can't see the values of C and gamma. In that case how do i retrain? Btw i applying 10 fold cross validation. e.g <pre class="prettyprint"><code>optimization finished, #iter = 138 nu = 0.612233 obj = -90.291046, rho = -0.367013 nSV = 165, nBSV = 128 Total nSV = 165 Cross Validation Accuracy = 98.1273% </code></pre> Need some help on it.. To get the best C and gamma, i use this code that is available in the LIBSVM FAQ <pre class="prettyprint"><code>bestcv = 0; for log2c = -6:10, for log2g = -6:3, cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)]; cv = svmtrain(TrainLabel,TrainVec, cmd); if (cv >= bestcv), bestcv = cv; bestc = 2^log2c; bestg = 2^log2g; end fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv); end end </code></pre> Another question : Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracy similar? Another question : Cross-validation basically improves the accuracy of the model by avoiding the overfitting. So, it needs to have a model in place before it can improve. Am i right? Besides that, if i have a different model, then the cross-validation accuracy will be different? Am i right? One more question: In the cross-validation accuracy, what is the value of C and gamma then? The graph is something like this <img src="https://i.stack.imgur.com/2Tr3n.png" alt="enter image description here"> Then the values of C are 2 and gamma = 0.0078125. But when i retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance...

The <code>-v</code> option here is really meant to be used as a way to avoid the overfitting problem (instead of using the whole data for training, perform an N-fold cross-validation training on <code>N-1</code> folds and testing on the remaining fold, one at-a-time, then report the average accuracy). Thus it only returns the cross-validation accuracy (assuming you have a classification problem, otherwise mean-squared error for regression) as a scalar number instead of an actual SVM model. If you want to perform model selection, you have to implement a grid search using cross-validation (similar to the <code>grid.py</code> helper python script), to find the best values of <code>C</code> and <code>gamma</code>. This shouldn't be hard to implement: create a grid of values using MESHGRID, iterate overall all pairs <code>(C,gamma)</code> training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy... Example: <pre class="prettyprint"><code>%# read some training data [labels,data] = libsvmread('./heart_scale'); %# grid of parameters folds = 5; [C,gamma] = meshgrid(-5:2:15, -15:2:3); %# grid search, and cross-validation cv_acc = zeros(numel(C),1); for i=1:numel(C) cv_acc(i) = svmtrain(labels, data, ... sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds)); end %# pair (C,gamma) with best accuracy [~,idx] = max(cv_acc); %# contour plot of paramter selection contour(C, gamma, reshape(cv_acc,size(C))), colorbar hold on plot(C(idx), gamma(idx), 'rx') text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ... 'HorizontalAlign','left', 'VerticalAlign','top') hold off xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy') %# now you can train you model using best_C and best_gamma best_C = 2^C(idx); best_gamma = 2^gamma(idx); %# ... </code></pre> <img src="https://i.stack.imgur.com/KkxcZ.png" alt="contour_plot">

Retraining after Cross Validation with libsvm

Tags:

machine-learning

classification

matlab

svm

libsvm

I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option.

But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g 85%). There is no model and i can't see the values of C and gamma. In that case how do i retrain?

Btw i applying 10 fold cross validation. e.g

optimization finished, #iter = 138
nu = 0.612233
obj = -90.291046, rho = -0.367013
nSV = 165, nBSV = 128
Total nSV = 165
Cross Validation Accuracy = 98.1273%

Need some help on it..

To get the best C and gamma, i use this code that is available in the LIBSVM FAQ

bestcv = 0;
for log2c = -6:10,
  for log2g = -6:3,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(TrainLabel,TrainVec, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv);
  end
end

Another question : Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracy similar?

Another question : Cross-validation basically improves the accuracy of the model by avoiding the overfitting. So, it needs to have a model in place before it can improve. Am i right? Besides that, if i have a different model, then the cross-validation accuracy will be different? Am i right?

One more question: In the cross-validation accuracy, what is the value of C and gamma then?

The graph is something like this enter image description here

Then the values of C are 2 and gamma = 0.0078125. But when i retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance...

959

asked Jan 28 '12 17:01

lakshmen

2 Answers

The -v option here is really meant to be used as a way to avoid the overfitting problem (instead of using the whole data for training, perform an N-fold cross-validation training on N-1 folds and testing on the remaining fold, one at-a-time, then report the average accuracy). Thus it only returns the cross-validation accuracy (assuming you have a classification problem, otherwise mean-squared error for regression) as a scalar number instead of an actual SVM model.

If you want to perform model selection, you have to implement a grid search using cross-validation (similar to the grid.py helper python script), to find the best values of C and gamma.

This shouldn't be hard to implement: create a grid of values using MESHGRID, iterate overall all pairs (C,gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy...

Example:

%# read some training data
[labels,data] = libsvmread('./heart_scale');

%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
    cv_acc(i) = svmtrain(labels, data, ...
                    sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
    'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')

%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...

contour_plot

183

answered Oct 13 '22 15:10

Amro

If you use your entire dataset to determine your parameters, then train on that dataset, you are going to overfit your data. Ideally, you would divide the dataset, do the parameter search on a portion (with CV), then use the other portion to train and test with CV. Will you get better results if you use the whole dataset for both? Of course, but your model is likely to not generalize well. If you want determine true performance of your model, you need to do parameter selection separately.

answered Oct 13 '22 17:10

karenu

Related questions
                            
                                How to detect curves in a binary image?
                            
                                Alternatives to Matlab's Image Processing Toolkit
                            
                                If vs Continue statement in a for loop
                            
                                How to recognize overflow bugs in Matlab?
                            
                                Matlab libsvm - how to find the w coefficients
                            
                                Access array contents from a .mat file loaded using Scipy.io.loadmat - python
                            
                                How to run MATLAB code from within Python
                            
                                What is the best way to implement a tree in matlab?
                            
                                How to check if matlab toolbox installed in matlab
                            
                                How to apply Gabor wavelets to an image?
                            
                                Is it possible to debug mex code with Eclipse?
                            
                                How to integrate Matlab code library with Android?
                            
                                matlab: scatter plots with high number of datapoints
                            
                                MATLAB: Is it possible to overload operators on native constructs (cells, structs, etc)?
                            
                                Test if a data distribution follows a Gaussian distribution in MATLAB
                            
                                Matlab - for loop in anonymus function
                            
                                How can I disable dbstop if error in MATLAB
                            
                                Python/PIL affine transformation
                            
                                Best way to organize MATLAB classes? [closed]
                            
                                Plot inside a loop in MATLAB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With