Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LibSVM turns all my training vectors into support vectors, why?

I am trying to use SVM for News article classification.

I created a table that contains the features (unique words found in the documents) as rows. I created weight vectors mapping with these features. i.e if the article has a word that is part of the feature vector table that location is marked as 1 or else 0.

Ex:- Training sample generated...

1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1 10:1 11:1 12:1 13:1 14:1 15:1 16:1 17:1 18:1 19:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1 27:1 28:1 29:1 30:1

As this is the first document all the features are present.

I am using 1, 0 as class labels.

I am using svm.Net for classification.

I gave 300 weight vectors manually classified as training data and the model generated is taking all the vectors as support vectors, which is surely overfitting.

My total features (unique words/row count in feature vector DB table) is 7610.

What could be the reason?

Because of this over fitting my project is now in pretty bad shape. It is classifying every article available as a positive article.

In LibSVM binary classification is there any restriction on the class label?

I am using 0, 1 instead of -1 and +1. Is that a problem?

like image 240
Krishna Chaitanya M Avatar asked Apr 20 '11 13:04

Krishna Chaitanya M


People also ask

Is it better to have more or less support vectors?

Fewer support vectors means faster classification of test points.

What is the minimum possible number of support vectors for an N dimensional dataset?

The minimum number of support vectors is two for your scenario. You don't need more than two here. All of the support vectors lie exactly on the margin. Regardless of the number of dimensions or size of data set, the number of support vectors could be as little as 2.


1 Answers

You need to do some type of parameter search, also if the classes are unbalanced the classifier might get artificially high accuracies without doing much. This guide is good at teaching basic, practical things, you should probably read it

like image 119
carlosdc Avatar answered Oct 21 '22 20:10

carlosdc