Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Vowpal Wabbit handle datasize ~ 90 GB?

We have extracted features from search engine query log data and the feature file (as per input format of Vowpal Wabbit) amounts to 90.5 GB. The reason for this huge size is necessary redundancy in our feature construction. Vowpal Wabbit claims to be able to handle TBs of data in a matter of few hours. In addition to that, VW uses a hash function which takes almost no RAM. But When we run logistic regression using VW on our data, within a few minutes, it uses up all of RAM and then stalls. This is the command we use-

vw -d train_output --power_t 1  --cache_file train.cache -f data.model 
--compressed --loss_function logistic --adaptive --invariant 
--l2 0.8e-8 --invert_hash train.model

train_output is the input file we want to train VW on, and train.model is the expected model obtained after training

Any help is welcome!

like image 429
Satarupa Guha Avatar asked Dec 15 '25 01:12

Satarupa Guha


1 Answers

I've found the --invert_hash option to be extremely costly; try running without that option. You can also try turning on the --l1 regularization option to reduce the number of coefficients in the model.

How many features do you have in your model? How many features per row are there?

like image 101
Zach Avatar answered Dec 16 '25 21:12

Zach



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!