I'm trying Vowpal Wabbit and am in the process of figuring out the file formats required for training and testing. I've been following the tutorial from https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial and see that the following is the training data format:
0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924
For the testing data, I don't have the labels or any outputs, but just the features. How would I go about writing that out? I've tried just including the features like so:
price:.23 sqft:.25 age:.05 2006
price:.18 sqft:.15 age:.35 1976
price:.53 sqft:.32 age:.87 1924
But, that gives me exceptions as it's not the proper format. I have also tried the following and all give me just 0's as results:
| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924
0 0 0 | price:.23 sqft:.25 age:.05 2006
0 0 0 | price:.18 sqft:.15 age:.35 1976
0 0 0 | price:.53 sqft:.32 age:.87 1924
Anyone the format I should be aiming for, knowing only the features? Thanks for the help.
The bar symbol (|) must be also in the format for predictions:
| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924
If you don't include the correct labels, vw cannot compute the test loss, of course.
To get the predictions use vw -d test_set.vw -t -p predictions.txt
. The training set in the tutorial (with three examples only) is too small to train any reasonable model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With