Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vowpal Wabbit inverted_hash option produces empty output, but why?

Tags:

vowpalwabbit

I'm trying to get a vowpal wabbit model saved with inverted hashes. I have a valid model produced with the following:

vw --oaa 2 -b 24 -d mydata.vw --readable_model mymodel.readable

which produces a model file like this:

Version 7.7.0
Min label:-1.000000
Max label:1.000000
bits:24
0 pairs: 
0 triples: 
rank:0
lda:0
0 ngram: 
0 skip: 
options: --oaa 2
:0
66:0.016244
67:-0.016241
80:0.026017
81:-0.026020
84:0.015005
85:-0.015007
104:-0.053924
105:0.053905
112:-0.015402
113:0.015412
122:-0.025704
123:0.025704
...

(and so on for many thousands more features). However, to be more useful, I need to see the feature names. Seemed like a fairly obvious thing, but I did

vw --oaa 2 -b 24 -d mydata.vw --invert_hash mymodel.inverted

and it produced a model file like this (no weights are produced):

Version 7.7.0
Min label:-1.000000
Max label:1.000000
bits:24
0 pairs: 
0 triples: 
rank:0
lda:0
0 ngram: 
0 skip: 
options: --oaa 2
:0

It feels like I've obviously done something wrong, but I think I'm using the options in the documented way:

--invert_hash is similar to --readable_model, but the model is output in a more human readable format with feature names followed by weights, instead of hash indexes and weights.

Does anyone see why my second command is failing to produce any output?

like image 955
Ben Collins Avatar asked Jun 26 '14 17:06

Ben Collins


1 Answers

This is caused by a bug in VW which was fixed recently (on account of this question), see https://github.com/JohnLangford/vowpal_wabbit/issues/337.

By the way, it does not make sense to use --oaa 2. If you want binary classification (aka logistic regression), use --loss_function=logistic (and make sure your labels are 1 and -1). OAA makes sense only for N>2 number of classes (and it is recommended to use --loss_function=logistic with --oaa).

Also note that learning with --invert_hash is much slower (and requires more memory, of course). The recommended way how to create inverted-hash model, especially with multiple passes, is to learn a usual binary model and then convert it to inverted hash using one pass over the training data with -t:

vw -d mytrain.data -c --passes 4 -oaa 3 -f model.binary
vw -d mytrain.data -t -i model.binary --invert_hash model.humanreadable
like image 153
Martin Popel Avatar answered Oct 25 '22 08:10

Martin Popel