I had a couple questions about the output from a simple run of VW. I have read around the internet and the wiki sites but am still unsure about a couple of basic things.
I ran the following on the boston housing data:
vw -d housing.vm --progress 1
where the housing.vm file is set up as (partially):
and output is (partially):
Question 1:
1) Is it correct to think about the average loss column as the following steps:
a) predict zero, so the first average loss is the squared error of the first example (with the prediction as zero)
b) build a model on example 1 and predict example 2. Average the now 2 squared losses
c) build a model on example 1-2 and predict example 3. Average the now 3 squared losses
d) ...
Do this until you hit the end of the data (assuming a single pass)
2) What is the current features columns? It appears to be the number of non-zero features + an intercept. What is shown in the example, suggests that a feature is not counted if it is zero - is that true? For instance, the second record has a value of zero for 'ZN'. Does VW really look at that numeric feature as missing??
Your statements are basically correct. By default, VW does online learning, so in step c it takes the current model (weights) and updates it with the current example (rather than learning from all the previous examples again).
As you supposed, the current features column is the number of (non-zero) features for the current example. The intercept feature is included automatically, unless you specify --noconstant
.
There is no difference between a missing feature and a feature with zero value. Both means that you won't update the corresponding weight.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With