Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you handle data imbalance in SVM?

Tags:

svm

If I am training a SVM on a lrge training set and if the class variable is either True or False, would having very few True values compared to he number of False values in the training set affect the training model/results? Should they be equal? If my training set doesn't have an equal distribution of True and False, how do I take care of this such that my training is done as efficiently as possible?

like image 851
London guy Avatar asked Jul 31 '12 08:07

London guy


1 Answers

It's fine to have imbalanced data, because the SVM should be able to assign a greater penalty to misclassification errors related with the less likely instance (e.g. "True" in your case), rather than assign equal error weight which results in the undesirable classifier that assigns everything to the majority. However, you'll probably get better results with balanced data. It all depends on your data, really.

You could skew the data artificially to get more balanced data. Why don't you check this paper: http://pages.stern.nyu.edu/~fprovost/Papers/skew.PDF.

like image 110
TakeS Avatar answered Dec 31 '22 19:12

TakeS