Has anyone managed to run an ordinary least squares regression in Vowpal Wabbit? I'm trying to confirm that it will return the same answer as the exact solution, i.e. when choosing a to minimize ||y - X a||_2 + ||Ra||_2
(where R is the regularization) I want to get the analytic answer
a = (X^T X + R^T R)^(-1) X^T y
. Doing this type of regression takes about 5 lines in numpy python.
The documentation of VW suggests that it can do this (presumably the "squared" loss function) but so far I've been unable to get it to come even close to matching the python results. Becuase squared is the default loss function, I'm simply calling:
$ vw-varinfo input.txt
where input.txt has lines like
1.4 | 0:3.4 1:-1.2 2:4.0 .... etc
Do I need some other parameters in the VW call? I'm unable to grok the (rather minimal) documentation.
I think you should use this syntax (vowpal wabbit version 7.3.1):
vw -d input.txt -f linear_model -c --passes 50 --holdout_off --loss_function squared --invert_hash model_readable.txt
This syntax will instruct VW to read your input.txt file, write on disk a model record and a cache (necessary for multi-pass convergence) and fit a regression using the squared loss function. Moreover it will finally write the model coefficients in a readable fashion into a file called model_readable.txt.
The --holdout_off option is a recent additional one in order to suppress the out-of-sample automatic loss computation (if you are using an earlier version you have to remove it).
Basically a regression analysis based on stochastic gradient descent will provide you with a vector of coefficients similar to the exact solution only when no regularization is applied and when the number of passes is high (I would suggest 50 or even more, also randomly shuffling the input file rows would help the algorithm to converge better).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With