10 fold cross-validation in one-against-all SVM (using LibSVM)

2 Answers

Mainly there are two reasons we do cross-validation:

as a testing method which gives us a nearly unbiased estimate of the generalization power of our model (by avoiding overfitting)
as a way of model selection (eg: find the best C and gamma parameters over the training data, see this post for an example)

For the first case which we are interested in, the process involves training k models for each fold, and then training one final model over the entire training set. We report the average accuracy over the k-folds.

Now since we are using one-vs-all approach to handle the multi-class problem, each model consists of N support vector machines (one for each class).

The following are wrapper functions implementing the one-vs-all approach:

function mdl = libsvmtrain_ova(y, X, opts)
    if nargin < 3, opts = ''; end

    %# classes
    labels = unique(y);
    numLabels = numel(labels);

    %# train one-against-all models
    models = cell(numLabels,1);
    for k=1:numLabels
        models{k} = libsvmtrain(double(y==labels(k)), X, strcat(opts,' -b 1 -q'));
    end
    mdl = struct('models',{models}, 'labels',labels);
end

function [pred,acc,prob] = libsvmpredict_ova(y, X, mdl)
    %# classes
    labels = mdl.labels;
    numLabels = numel(labels);

    %# get probability estimates of test instances using each 1-vs-all model
    prob = zeros(size(X,1), numLabels);
    for k=1:numLabels
        [~,~,p] = libsvmpredict(double(y==labels(k)), X, mdl.models{k}, '-b 1 -q');
        prob(:,k) = p(:, mdl.models{k}.Label==1);
    end

    %# predict the class with the highest probability
    [~,pred] = max(prob, [], 2);
    %# compute classification accuracy
    acc = mean(pred == y);
end

And here are functions to support cross-validation:

function acc = libsvmcrossval_ova(y, X, opts, nfold, indices)
    if nargin < 3, opts = ''; end
    if nargin < 4, nfold = 10; end
    if nargin < 5, indices = crossvalidation(y, nfold); end

    %# N-fold cross-validation testing
    acc = zeros(nfold,1);
    for i=1:nfold
        testIdx = (indices == i); trainIdx = ~testIdx;
        mdl = libsvmtrain_ova(y(trainIdx), X(trainIdx,:), opts);
        [~,acc(i)] = libsvmpredict_ova(y(testIdx), X(testIdx,:), mdl);
    end
    acc = mean(acc);    %# average accuracy
end

function indices = crossvalidation(y, nfold)
    %# stratified n-fold cros-validation
    %#indices = crossvalind('Kfold', y, nfold);  %# Bioinformatics toolbox
    cv = cvpartition(y, 'kfold',nfold);          %# Statistics toolbox
    indices = zeros(size(y));
    for i=1:nfold
        indices(cv.test(i)) = i;
    end
end

Finally, here is simple demo to illustrate the usage:

%# laod dataset
S = load('fisheriris');
data = zscore(S.meas);
labels = grp2idx(S.species);

%# cross-validate using one-vs-all approach
opts = '-s 0 -t 2 -c 1 -g 0.25';    %# libsvm training options
nfold = 10;
acc = libsvmcrossval_ova(labels, data, opts, nfold);
fprintf('Cross Validation Accuracy = %.4f%%\n', 100*mean(acc));

%# compute final model over the entire dataset
mdl = libsvmtrain_ova(labels, data, opts);

Compare that against the one-vs-one approach which is used by default by libsvm:

acc = libsvmtrain(labels, data, sprintf('%s -v %d -q',opts,nfold));
model = libsvmtrain(labels, data, strcat(opts,' -q'));

185

answered Oct 10 '22 21:10

Amro

It may be confusing you that one of the two questions is not about LIBSVM. You should try to adjust this answer and ignore the other.

You should select the folds, and do the rest exactly as the linked question. Assume the data has been loaded into data and the labels into labels:

n = size(data,1);
ns = floor(n/10);
for fold=1:10,
    if fold==1,
        testindices= ((fold-1)*ns+1):fold*ns;
        trainindices = fold*ns+1:n;
    else
        if fold==10,
            testindices= ((fold-1)*ns+1):n;
            trainindices = 1:(fold-1)*ns;
        else
            testindices= ((fold-1)*ns+1):fold*ns;
            trainindices = [1:(fold-1)*ns,fold*ns+1:n];
         end
    end
    % use testindices only for testing and train indices only for testing
    trainLabel = label(trainindices);
    trainData = data(trainindices,:);
    testLabel = label(testindices);
    testData = data(testindices,:)
    %# train one-against-all models
    model = cell(numLabels,1);
    for k=1:numLabels
        model{k} = svmtrain(double(trainLabel==k), trainData, '-c 1 -g 0.2 -b 1');
    end

    %# get probability estimates of test instances using each model
    prob = zeros(size(testData,1),numLabels);
    for k=1:numLabels
        [~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
        prob(:,k) = p(:,model{k}.Label==1);    %# probability of class==k
    end

    %# predict the class with the highest probability
    [~,pred] = max(prob,[],2);
    acc = sum(pred == testLabel) ./ numel(testLabel)    %# accuracy
    C = confusionmat(testLabel, pred)                   %# confusion matrix
end

answered Oct 10 '22 20:10

carlosdc

Related questions
                            
                                Matlab: Is there a way to get the path of the current script? [duplicate]
                            
                                calculating a function in matlab with very small values
                            
                                precision differences in matlab and c++
                            
                                Why is MATLAB sensitive to order of fields in a struct array assignment?
                            
                                Do I conserve memory in MATLAB by declaring variables global instead of passing them as arguments?
                            
                                How do you print a string in MATLAB in color?
                            
                                Matlab: Free memory is lost after calling a function
                            
                                How to Test if row is in matrix?
                            
                                Using OpenGL in Matlab to get depth buffer
                            
                                scaling the testing data for LIBSVM: MATLAB implementation
                            
                                Converting a .mat file from MATLAB into cv::Mat matrix in OpenCV
                            
                                How can I access all field elements of a structure array nested in a cell array in MATLAB?
                            
                                Matlab Mex library lifecycle
                            
                                Reading date and time from CSV file in MATLAB
                            
                                Write a MAT file without using matlab headers and libraries
                            
                                Force matlab gui to update ui control mid-function
                            
                                What is a fast way to compute column by column correlation in matlab
                            
                                Counting colonies on a Petri dish
                            
                                Can a Matlab PARFOR loop be programmatically switched on/off?
                            
                                Draw a line through two points

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

10 fold cross-validation in one-against-all SVM (using LibSVM)

Tags:

machine-learning

classification

matlab

svm

libsvm

Zahra Ezati

People also ask

2 Answers

Amro

carlosdc

Recent Activity

Donate For Us