In this case, one of the inputs is the probability of choosing an arm/action but how do we find that probability? Isn't finding that probability itself a big task in hand?
Supplying the probability means you are taking a scenario where you are feeding actions taken historically, e.g. from a log, rather than performing the real online scenario. This is useful because (at least some of) Vowpal's Contextual Bandits models can be bootstrapped from historical data. Meaning, a Contextual Bandits policy learnt over historical data can outperform one that learns online from scratch ― something that you can do only if you have historical data relevant to the online scenario of yours.
The Wiki page has been recently edited to better reflect that this format generalizes for this case.
Another (contrived) use case for including probabilities might be that you are acting against multiple environments, but in any event to the best of my understanding the probability here can be interpreted as a mere frequency.
As such, my understanding is you do not have to supply the probability part in your input, when not feeding in historical interaction data. Just skip it as in the example here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With