Contextual Bandit using Vowpal wabbit

Question

In this case, one of the inputs is the probability of choosing an arm/action but how do we find that probability? Isn't finding that probability itself a big task in hand?

matanster · Accepted Answer

Supplying the probability means you are taking a scenario where you are feeding actions taken historically, e.g. from a log, rather than performing the real online scenario. This is useful because (at least some of) Vowpal's Contextual Bandits models can be bootstrapped from historical data. Meaning, a Contextual Bandits policy learnt over historical data can outperform one that learns online from scratch ― something that you can do only if you have historical data relevant to the online scenario of yours.

The Wiki page has been recently edited to better reflect that this format generalizes for this case.

Another (contrived) use case for including probabilities might be that you are acting against multiple environments, but in any event to the best of my understanding the probability here can be interpreted as a mere frequency.

As such, my understanding is you do not have to supply the probability part in your input, when not feeding in historical interaction data. Just skip it as in the example here.

Contextual Bandit using Vowpal wabbit

Tags:

vowpalwabbit

kunal

1 Answers

matanster

Recent Activity

Donate For Us

Contextual Bandit using Vowpal wabbit

Tags:

vowpalwabbit

kunal

1 Answers

matanster

Related questions

Recent Activity

Donate For Us