Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File format for classification using SVM light

I am trying to build a classifier using SVM light which classifies a document in one of the two classes. I have already trained and tested the classifier and a model file is saved to the disk. Now I want to use this model file to classify completely new documents. What should be the input file format for this? Could it be plain text file (I don't think that would work) or could be it just plain listing of features present in the text file without any class label and feature weights (in that case I have to keep track of the indices of features in feature vector during training) or is it some other format?

like image 573
ritesh Avatar asked Aug 20 '13 15:08

ritesh


People also ask

What is Svmlight format?

SVMlight format SVMlight is an implementation of Support Vector Machines (SVMs) in C. The author Thorsten Joachims designed a special input format to represent training/test data. It is also widely used by a lot of other programs.

What is SVM light?

SVMlight is an implementation of Vapnik's Support Vector Machine [Vapnik, 1995] for the problem of pattern recognition and for the problem of regression. The optimization algorithm used in SVMlight is described in [Joachims, 1999a].


2 Answers

Training and testing files must be of the same format, each instance results in a line of the following form:

<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

For example (copy pasta from SVM^light website):

-1 1:0.43 3:0.12 9284:0.2 # abcdef

You can consult the SVM^light website for more information.

like image 75
Marc Claesen Avatar answered Sep 30 '22 01:09

Marc Claesen


The file format to make predictions is the same as the one to make test and train, i.e.

<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

But to make prediction the target is unknow, thus you have to use 0 value as target. Thi is the only difference. I hope this helps someone

like image 36
Umbert Avatar answered Sep 30 '22 00:09

Umbert