Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

libsvm model file format

According to this FAQ the model format in libsvm should be straightforward. And in fact it is, when I call just svm-train. As an example, the first SV for the a1a dataset is

 1 3:1 11:1 14:1 19:1 39:1 42:1 55:1 64:1 67:1 73:1 75:1 76:1 80:1 83:1

On the other hand, if I use the easy.py script, my first SV ends up being:

 512 1:-1 2:-1 3:1 4:-1 5:-1 6:-1 7:-1 8:-1 9:-1 10:-1 11:1 13:-1 14:1 15:-1 16:-1 17:-1 18:-1 19:1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:1 40:-1 41:-1 42:1 43:-1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 55:1 56:-1 57:-1 58:-1 59:-1 61:-1 62:-1 63:-1 64:1 65:-1 66:-1 67:1 68:-1 69:-1 70:-1 71:-1 72:-1 73:1 74:-1 75:1 76:1 77:-1 78:-1 79:-1 80:1 81:-1 82:-1 83:1 84:-1 85:-1 86:-1 87:-1 88:-1 90:-1 91:-1 92:-1 93:-1 94:-1 95:-1 97:-1 98:-1 99:-1 100:-1 101:-1 102:-1 103:-1 104:-1 105:-1 106:-1 107:-1 108:-1 109:-1 110:-1 112:-1 113:-1 114:-1 115:-1 117:-1 118:-1 119:-1 

which is an instance that doesn't exist at all in my training set! In fact if I do:

 $ grep "119:" a1a
 -1 1:1 6:1 18:1 22:1 36:1 42:1 49:1 66:1 67:1 73:1 74:1 76:1 80:1 119:1 
 -1 1:1 6:1 18:1 26:1 35:1 43:1 53:1 65:1 67:1 73:1 74:1 76:1 80:1 119:1 
 -1 2:1 6:1 15:1 19:1 39:1 42:1 55:1 62:1 67:1 72:1 74:1 76:1 78:1 119:1 
 -1 4:1 6:1 16:1 21:1 35:1 44:1 49:1 64:1 67:1 72:1 74:1 76:1 78:1 119:1 
 -1 2:1 6:1 14:1 30:1 35:1 42:1 49:1 65:1 67:1 72:1 74:1 76:1 78:1 119:1 
 -1 2:1 6:1 17:1 20:1 37:1 40:1 57:1 63:1 67:1 73:1 74:1 76:1 80:1 119:1 
 -1 5:1 6:1 18:1 22:1 36:1 40:1 54:1 61:1 67:1 72:1 75:1 76:1 80:1 119:1 
 -1 5:1 6:1 17:1 26:1 35:1 42:1 53:1 62:1 67:1 73:1 74:1 76:1 80:1 119:1 

There isn't any instance with 119:-1 (and even if it's just swapping +1 with -1, there isn't any instance with 119:1 and 118:1 either - missing attributes are zeros)

If I do this source code modification, I clearly see that in the former case (only svm-train involved) the first SV is also the first instance. But in the latter case (i.e. with easy.py script), the output which should give me which instance is the SV is eaten by grid.py

What's going on, here?

like image 946
Davide Avatar asked Mar 01 '23 01:03

Davide


1 Answers

I think the culprit here is probably the call easy.py makes to svm-scale, which scales each attribute to be within [-1,1]. The training examples sent to svm-train will not be the same ones that are in your training file.

like image 154
Stompchicken Avatar answered Mar 08 '23 15:03

Stompchicken