Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Contextual Bandit using Vowpal wabbit

Tags:

vowpalwabbit

In this case, one of the inputs is the probability of choosing an arm/action but how do we find that probability? Isn't finding that probability itself a big task in hand?

like image 253
kunal Avatar asked May 25 '15 11:05

kunal


1 Answers

Supplying the probability means you are taking a scenario where you are feeding actions taken historically, e.g. from a log, rather than performing the real online scenario. This is useful because (at least some of) Vowpal's Contextual Bandits models can be bootstrapped from historical data. Meaning, a Contextual Bandits policy learnt over historical data can outperform one that learns online from scratch ― something that you can do only if you have historical data relevant to the online scenario of yours.

The Wiki page has been recently edited to better reflect that this format generalizes for this case.

Another (contrived) use case for including probabilities might be that you are acting against multiple environments, but in any event to the best of my understanding the probability here can be interpreted as a mere frequency.

As such, my understanding is you do not have to supply the probability part in your input, when not feeding in historical interaction data. Just skip it as in the example here.

like image 166
matanster Avatar answered Sep 25 '22 00:09

matanster