How do you handle data imbalance in SVM?

Question

If I am training a SVM on a lrge training set and if the class variable is either True or False, would having very few True values compared to he number of False values in the training set affect the training model/results? Should they be equal? If my training set doesn't have an equal distribution of True and False, how do I take care of this such that my training is done as efficiently as possible?

TakeS · Accepted Answer

It's fine to have imbalanced data, because the SVM should be able to assign a greater penalty to misclassification errors related with the less likely instance (e.g. "True" in your case), rather than assign equal error weight which results in the undesirable classifier that assigns everything to the majority. However, you'll probably get better results with balanced data. It all depends on your data, really.

You could skew the data artificially to get more balanced data. Why don't you check this paper: http://pages.stern.nyu.edu/~fprovost/Papers/skew.PDF.

How do you handle data imbalance in SVM?

Tags:

svm

London guy

1 Answers

TakeS

Recent Activity

Donate For Us

How do you handle data imbalance in SVM?

Tags:

svm

London guy

1 Answers

TakeS

Related questions

Recent Activity

Donate For Us