Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vowpal Wabbit training and testing data formats

I'm trying Vowpal Wabbit and am in the process of figuring out the file formats required for training and testing. I've been following the tutorial from https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial and see that the following is the training data format:

0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924

For the testing data, I don't have the labels or any outputs, but just the features. How would I go about writing that out? I've tried just including the features like so:

price:.23 sqft:.25 age:.05 2006
price:.18 sqft:.15 age:.35 1976
price:.53 sqft:.32 age:.87 1924

But, that gives me exceptions as it's not the proper format. I have also tried the following and all give me just 0's as results:

| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924

0 0 0 | price:.23 sqft:.25 age:.05 2006
0 0 0 | price:.18 sqft:.15 age:.35 1976
0 0 0 | price:.53 sqft:.32 age:.87 1924

Anyone the format I should be aiming for, knowing only the features? Thanks for the help.

like image 372
intl Avatar asked Nov 15 '14 00:11

intl


1 Answers

The bar symbol (|) must be also in the format for predictions:

| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924

If you don't include the correct labels, vw cannot compute the test loss, of course. To get the predictions use vw -d test_set.vw -t -p predictions.txt. The training set in the tutorial (with three examples only) is too small to train any reasonable model.

like image 176
Martin Popel Avatar answered Nov 15 '22 09:11

Martin Popel