Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-Armed Bandit Analysis for Price Optimization

Lately, I have read a blog post titled Bandits Know the Best Product Price" (http://pkghosh.wordpress.com/2013/08/25/bandits-know-the-best-product-price/), which outlines how to use multi-armed bandit analysis for price optimization.

There is also a lot of discussion on whether multi-armed bandit analysis is better than A/B testing (e.g. "20 lines of code that will beat A/B testing every time": http://stevehanov.ca/blog/index.php?id=132?utm_medium=referral versus "Why multi-armed bandit algorithm is not 'better' than A/B testing": http://visualwebsiteoptimizer.com/split-testing-blog/multi-armed-bandit-algorithm/).

I am aware that there is a R package called "bandit", which can be used for such an analysis.

Does someone has a toy example - comparable to the one in the blog post - which shows how to apply this method by using R (within the context of price optimization)?

Thanks for your help.

like image 217
majom Avatar asked Mar 04 '14 10:03

majom


People also ask

When would you use a multi-armed bandit?

When the item being tested changes significantly enough to invalidate the results of an A/B test over time, multi-armed bandits provide an alternative to repeatedly retesting by continuously exploring. Targeting is another example of a long-term use of bandit algorithms.

What is bandit optimization?

Bandit optimization allocates traffic more efficiently among these discrete choices by sequentially updating the allocation of traffic based on each candidate's performance so far.

What is MAB testing?

MAB is a type of A/B testing that uses machine learning to learn from data gathered during the test to dynamically increase the visitor allocation in favor of better-performing variations. What this means is that variations that aren't good get less and less traffic allocation over time.

Is multi-armed bandit reinforcement learning?

Multi-Arm Bandit is a classic reinforcement learning problem, in which a player is facing with k slot machines or bandits, each with a different reward distribution, and the player is trying to maximise his cumulative reward based on trials.


2 Answers

I am doing a projects about bandit algorithms recently. Basically, the performance of bandit algorithms is decided greatly by the data set. And it´s very good for continuous testing with churning data. So what you need to do it to test and tune your model on testing data.

For undertanding bandit more, you can read this book, bandit algorithms for website optimization:http://shop.oreilly.com/product/0636920027393.do. It explains the basic bandit algorithms quite well and implements in Python. You can find its code in Github: https://github.com/johnmyleswhite/BanditsBook. However, they didn´t talk about contextual bandits in the book.

For R, I am not that sure. But I just searched online, I found a guy implemented bandits in R, here is the code: https://github.com/lotze/bandit

Hope it can help you.

like image 122
user3026143 Avatar answered Oct 20 '22 15:10

user3026143


My cautious explorations of this topic might be of use to you: http://codeandmath.wordpress.com/2014/04/05/type-i-error-in-bandits/

like image 30
Markus Loecher Avatar answered Oct 20 '22 16:10

Markus Loecher