Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Vowpal Wabbit, what is the difference between a namespace and feature?

While carrying out analysis in R or python we are only aware of feature names (their values) and use them. In Vowpal Wabbit we also have Namespaces.

I am unable to understand: a. what is meant by Namespace; b. how is it different from features; c. when is it used? And when not used? That is, can we avoid using it. d. And how is it used?

Will be grateful for one or two examples. Sorry for so many questions.

like image 721
Ashok K Harnal Avatar asked Jan 08 '23 21:01

Ashok K Harnal


1 Answers

In vowpal wabbit name-spaces are used in order to conveniently generate interaction features on-the-fly during run-time without the need to predeclare them.

A simple example format, without a name space is:

1 | a:2 b:3

where 1 is the label, and a, b are regular input features.

Note that there's a space after the |.

Contrast the above with using two name spaces x and y (note no space between the | separator and the name-spaces):

1 |x a:2 |y b:3

This example is essentially equivalent (except for feature hash locations) to the first example. It still has two features with the same values as the original example. The difference is that now with these name-spaces, we can cross features by passing options to vw. For example:

vw -q xy

will generate additional features on-the-fly by crossing all features in name-space x with all features in name-space y. The names of the auto-generated features will be the concatenation of the names from the two name-spaces and the values will be the products of their respective values. In this particular case, it would be as if our data-set had one additional feature: ab:6 (*)

Obviously, this is a very simple example, imagine that you have an example with 3 features in a name-space:

1 |x a:2 b:3 c:5

By adding -q xx to vw you could automatically generate 6 additional interaction features: aa, ab, ac, bb, bc, cc on the fly. And if you had 3 name-spaces, say: x, y, z, you could cross any (or any wanted subset) of them: -q xx -q xy -q xz -q yz -q yy -q zz on the command-line to get all possible interactions between the separate sets of features.

That's all there is to it. It is a powerful feature allowing you to experiment and add interaction features on the fly.

There are several options which accept (1st letters of) name-spaces as arguments, among them:

-q
--cubic
--ignore
--keep
--redefine (very new)
--lrq

Check out the vw command line arguments wiki for more details.

(*) In practice, the feature names will have the name spaces prepended to them with a ^ separator in between so the actual hashed string would be x^a^y^b:6 rather than ab:6 (You may verify this by using the --audit option) but this is just a detail.

like image 75
arielf Avatar answered Feb 05 '23 22:02

arielf