Using LIBSVM grid.py for unbalanced data?

Question

I'm having a three class problem with unbalanced data (90%, 5%, 5%). Now I want to train a classifier using LIBSVM.

The problem is that LIBSVM optimizes its parameter gamma and Cost for optimal accuracy, which means that 100% of the examples are classified as class 1, which is of course not what I want.

I've tried modifying the weight parameters -w without much success.

So what I want is, modifying grid.py in a way that it optimizes Cost and gamma for precision and recall separated by classes rather than for overall accuracy. Is there any way to do that? Or are there other scripts out there that can do something like this?

Oriol Nieto · Accepted Answer

The -w parameter is what you need for unbalanced data. What have you tried so far?

If your classes are:

class 0: 90%
class 1: 5%
class 2: 5%

You should pass the following params to svm:

-w0 5 -w1 90 -w2 90

user1149913 · Answer

If you want to try an alternative, one of the programs in the svmlight family, http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html, directly minimizes the area under the ROC curve.

Minimizing the AUC may give better results than re-weighting training examples.

Using LIBSVM grid.py for unbalanced data?

Tags:

machine-learning

svm

text-mining

libsvm

Damnum

2 Answers

Oriol Nieto

user1149913

Recent Activity

Donate For Us

Using LIBSVM grid.py for unbalanced data?

Tags:

machine-learning

svm

text-mining

libsvm

Damnum

2 Answers

Oriol Nieto

user1149913

Related questions

Recent Activity

Donate For Us