I want to do a 10-fold cross-validation in my one-against-all support vector machine classification in MATLAB.
I tried to somehow mix these two related answers:
But as I'm new to MATLAB and its syntax, I didn't manage to make it work till now.
On the other hand, I saw just the following few lines about cross validation in the LibSVM README files and I couldn't find any related example there:
option -v randomly splits the data into n parts and calculates cross validation accuracy/mean squared error on them.
See libsvm FAQ for the meaning of outputs.
Could anyone provide me an example of 10-fold cross-validation and one-against-all classification?
With this method we have one data set which we divide randomly into 10 parts. We use 9 of those parts for training and reserve one tenth for testing. We repeat this procedure 10 times each time reserving a different tenth for testing.
Introduction. When performing cross-validation, it is common to use 10 folds.
Cross validation is a technique for assessing how the statistical analysis generalises to an independent data set.It is a technique for evaluating machine learning models by training several models on subsets of the available input data and evaluating them on the complementary subset of the data.
What is K-Fold? K-Fold is validation technique in which we split the data into k-subsets and the holdout method is repeated k-times where each of the k subsets are used as test set and other k-1 subsets are used for the training purpose.
Mainly there are two reasons we do cross-validation:
C
and gamma
parameters over the training data, see this post for an example)For the first case which we are interested in, the process involves training k
models for each fold, and then training one final model over the entire training set.
We report the average accuracy over the k-folds.
Now since we are using one-vs-all approach to handle the multi-class problem, each model consists of N
support vector machines (one for each class).
The following are wrapper functions implementing the one-vs-all approach:
function mdl = libsvmtrain_ova(y, X, opts)
if nargin < 3, opts = ''; end
%# classes
labels = unique(y);
numLabels = numel(labels);
%# train one-against-all models
models = cell(numLabels,1);
for k=1:numLabels
models{k} = libsvmtrain(double(y==labels(k)), X, strcat(opts,' -b 1 -q'));
end
mdl = struct('models',{models}, 'labels',labels);
end
function [pred,acc,prob] = libsvmpredict_ova(y, X, mdl)
%# classes
labels = mdl.labels;
numLabels = numel(labels);
%# get probability estimates of test instances using each 1-vs-all model
prob = zeros(size(X,1), numLabels);
for k=1:numLabels
[~,~,p] = libsvmpredict(double(y==labels(k)), X, mdl.models{k}, '-b 1 -q');
prob(:,k) = p(:, mdl.models{k}.Label==1);
end
%# predict the class with the highest probability
[~,pred] = max(prob, [], 2);
%# compute classification accuracy
acc = mean(pred == y);
end
And here are functions to support cross-validation:
function acc = libsvmcrossval_ova(y, X, opts, nfold, indices)
if nargin < 3, opts = ''; end
if nargin < 4, nfold = 10; end
if nargin < 5, indices = crossvalidation(y, nfold); end
%# N-fold cross-validation testing
acc = zeros(nfold,1);
for i=1:nfold
testIdx = (indices == i); trainIdx = ~testIdx;
mdl = libsvmtrain_ova(y(trainIdx), X(trainIdx,:), opts);
[~,acc(i)] = libsvmpredict_ova(y(testIdx), X(testIdx,:), mdl);
end
acc = mean(acc); %# average accuracy
end
function indices = crossvalidation(y, nfold)
%# stratified n-fold cros-validation
%#indices = crossvalind('Kfold', y, nfold); %# Bioinformatics toolbox
cv = cvpartition(y, 'kfold',nfold); %# Statistics toolbox
indices = zeros(size(y));
for i=1:nfold
indices(cv.test(i)) = i;
end
end
Finally, here is simple demo to illustrate the usage:
%# laod dataset
S = load('fisheriris');
data = zscore(S.meas);
labels = grp2idx(S.species);
%# cross-validate using one-vs-all approach
opts = '-s 0 -t 2 -c 1 -g 0.25'; %# libsvm training options
nfold = 10;
acc = libsvmcrossval_ova(labels, data, opts, nfold);
fprintf('Cross Validation Accuracy = %.4f%%\n', 100*mean(acc));
%# compute final model over the entire dataset
mdl = libsvmtrain_ova(labels, data, opts);
Compare that against the one-vs-one approach which is used by default by libsvm:
acc = libsvmtrain(labels, data, sprintf('%s -v %d -q',opts,nfold));
model = libsvmtrain(labels, data, strcat(opts,' -q'));
It may be confusing you that one of the two questions is not about LIBSVM. You should try to adjust this answer and ignore the other.
You should select the folds, and do the rest exactly as the linked question. Assume the data has been loaded into data
and the labels into labels
:
n = size(data,1);
ns = floor(n/10);
for fold=1:10,
if fold==1,
testindices= ((fold-1)*ns+1):fold*ns;
trainindices = fold*ns+1:n;
else
if fold==10,
testindices= ((fold-1)*ns+1):n;
trainindices = 1:(fold-1)*ns;
else
testindices= ((fold-1)*ns+1):fold*ns;
trainindices = [1:(fold-1)*ns,fold*ns+1:n];
end
end
% use testindices only for testing and train indices only for testing
trainLabel = label(trainindices);
trainData = data(trainindices,:);
testLabel = label(testindices);
testData = data(testindices,:)
%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
model{k} = svmtrain(double(trainLabel==k), trainData, '-c 1 -g 0.2 -b 1');
end
%# get probability estimates of test instances using each model
prob = zeros(size(testData,1),numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == testLabel) ./ numel(testLabel) %# accuracy
C = confusionmat(testLabel, pred) %# confusion matrix
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With