Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for optimal expected amount in a profit/loss game

I came upon the following question recently,

"You have a box which has G green and B blue coins. Pick a random coin, G gives a profit of +1 and blue a loss of -1. If you play optimally what is the expected profit."

I was thinking of using a brute force algorithm where I consider all possibilities of combinations of green and blue coins but I'm sure there must be a better solution for this (range of B and G was from 0 to 5000). Also what does playing optimally mean? Does it mean that if i pick all blue coins then I would continue playing till all green coins are also picked? If so then this means I shouldn't consider all possibilities of green and blue coins?

like image 253
user2980096 Avatar asked Oct 18 '22 05:10

user2980096


1 Answers

The "obvious" answer is to play whenever there's more green coins than blue coins. In fact, this is wrong. For example, if there's 999 green coins and 1000 blue coins, here's a strategy that takes an expected profit:

Take 2 coins
If GG -- stop with a profit of 2
if BG or GB -- stop with a profit of 0
if BB -- take all the remaining coins for a profit of -1

Since the first and last possibilities both occur with near 25% probability, your overall expectation is approximately 0.25*2 - 0.25*1 = 0.25

This is just a simple strategy in one extreme example that shows that the problem is not as simple as it first seems.

In general, the expectations with g green coins and b blue coins is given by a recurrence relation:

E(g, 0) = g
E(0, b) = 0
E(g, b) = max(0, g(E(g-1, b) + 1)/(b+g) + b(E(g, b-1) - 1)/(b+g))

The max in the final row occurs because if it's -EV to play, then you're better stopping.

These recurrence relations can be solved using dynamic programming in O(gb) time.

from fractions import Fraction as F

def gb(G, B):
    E = [[F(0, 1)] * (B+1) for _ in xrange(G+1)]
    for g in xrange(G+1):
        E[g][0] = F(g, 1)
    for b in xrange(1, B+1):
        for g in xrange(1, G+1):
            E[g][b] = max(0, (g * (E[g-1][b]+1) + b * (E[g][b-1]-1)) * F(1, (b+g)))
    for row in E:
        for v in row:
            print '%5.2f' % v,
        print
    print
    return E[G][B]

print gb(8, 10)

Output:

 0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
 1.00  0.50  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
 2.00  1.33  0.67  0.20  0.00  0.00  0.00  0.00  0.00  0.00  0.00
 3.00  2.25  1.50  0.85  0.34  0.00  0.00  0.00  0.00  0.00  0.00
 4.00  3.20  2.40  1.66  1.00  0.44  0.07  0.00  0.00  0.00  0.00
 5.00  4.17  3.33  2.54  1.79  1.12  0.55  0.15  0.00  0.00  0.00
 6.00  5.14  4.29  3.45  2.66  1.91  1.23  0.66  0.23  0.00  0.00
 7.00  6.12  5.25  4.39  3.56  2.76  2.01  1.34  0.75  0.30  0.00
 8.00  7.11  6.22  5.35  4.49  3.66  2.86  2.11  1.43  0.84  0.36

7793/21879

From this you can see that the expectation is positive to play with 8 green and 10 blue coins (EV=7793/21879 ~= 0.36), and you even have positive expectation with 2 green and 3 blue coins (EV=0.2)

like image 166
Paul Hankin Avatar answered Oct 20 '22 22:10

Paul Hankin