I am trying to build a classifier using SVM light which classifies a document in one of the two classes. I have already trained and tested the classifier and a model file is saved to the disk. Now I want to use this model file to classify completely new documents. What should be the input file format for this? Could it be plain text file (I don't think that would work) or could be it just plain listing of features present in the text file without any class label and feature weights (in that case I have to keep track of the indices of features in feature vector during training) or is it some other format?
SVMlight format SVMlight is an implementation of Support Vector Machines (SVMs) in C. The author Thorsten Joachims designed a special input format to represent training/test data. It is also widely used by a lot of other programs.
SVMlight is an implementation of Vapnik's Support Vector Machine [Vapnik, 1995] for the problem of pattern recognition and for the problem of regression. The optimization algorithm used in SVMlight is described in [Joachims, 1999a].
Training and testing files must be of the same format, each instance results in a line of the following form:
<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float>
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>
For example (copy pasta from SVM^light website):
-1 1:0.43 3:0.12 9284:0.2 # abcdef
You can consult the SVM^light website for more information.
The file format to make predictions is the same as the one to make test and train, i.e.
<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float>
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>
But to make prediction the target is unknow, thus you have to use 0 value as target. Thi is the only difference. I hope this helps someone
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With