Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vowpal Wabbit Readable Model

I was using Vowpal Wabbit and was generating the classifier trained as a readable model.

My dataset had 22 features and the readable model gave as output:

Version 7.2.1
Min label:-50.000000
Max label:50.000000
bits:18
0 pairs:
0 triples:
rank:0
lda:0
0 ngram:
0 skip:
options:
:0
101143:0.035237
101144:0.033885
101145:0.013357
101146:-0.007537
101147:-0.039093
101148:-0.013357
101149:0.001748
116060:0.499471
157941:-0.037318
157942:0.008038
157943:-0.011337
196772:0.138384
196773:0.109454
196774:0.118985
196775:-0.022981
196776:-0.301487
196777:-0.118985
197006:-0.000514
197007:-0.000373
197008:-0.000288
197009:-0.004444
197010:-0.006072
197011:0.000270

Can somebody please explain to me how to interpret the last portion of the file (after options: )? I was using logistic regression and I need to check how iterating over the training updates my classifier so that I can understand when I reach a convergence...

Thanks in advance :)

like image 516
Bokha Avatar asked Dec 08 '22 10:12

Bokha


1 Answers

The values you see are the hash-values and weights of all your 22 features and one additional "Constant" feature (its hash value is 116060) in the resulting trained model.

The format is:

hash_value:weight

In order to see your original feature names instead of the hash value, you may use one of two methods:

  • Use the utl/vw-varinfo (in the source tree) utility on your training set with the same options you used for training. Try utl/vw-varinfo for a help/usage message
  • Use the relatively new --invert_hash readable.model option

BTW: inverting the hash values back to the original feature names is not the default due to the large performance penalty. By default, vw applies the one way hash on each feature string it sees. It doesn't maintain a hash-map between feature-names and their hash-values at all.

Edit:

Another little tidbit that may be of interest is the first entry after options: which reads:

:0

It essentially means that any "other" feature (all those which are not in the training-set, and thus, not hashed into the weight vector) defaults to a weight of 0. This means that it is redundant in vowpal-wabbit to train on features with values of zero, which is the default anyway. Explicit :0 value features simply won't contribute anything to the model. When you leave-out a weight in your training-set, as in: feature_name without a trailing :<value> vowpal wabbit implicitly assumes that it is a binary feature, with a TRUE value. IOW: it defaults all value-less features, to a value of one (:1) rather than a value of zero (:0). HTH.

like image 191
arielf Avatar answered Jan 14 '23 14:01

arielf