Algorithm for optimal expected amount in a profit/loss game

Question

I came upon the following question recently,

"You have a box which has G green and B blue coins. Pick a random coin, G gives a profit of +1 and blue a loss of -1. If you play optimally what is the expected profit."

I was thinking of using a brute force algorithm where I consider all possibilities of combinations of green and blue coins but I'm sure there must be a better solution for this (range of B and G was from 0 to 5000). Also what does playing optimally mean? Does it mean that if i pick all blue coins then I would continue playing till all green coins are also picked? If so then this means I shouldn't consider all possibilities of green and blue coins?

Paul Hankin · Accepted Answer

The "obvious" answer is to play whenever there's more green coins than blue coins. In fact, this is wrong. For example, if there's 999 green coins and 1000 blue coins, here's a strategy that takes an expected profit:

Take 2 coins
If GG -- stop with a profit of 2
if BG or GB -- stop with a profit of 0
if BB -- take all the remaining coins for a profit of -1

Since the first and last possibilities both occur with near 25% probability, your overall expectation is approximately 0.25*2 - 0.25*1 = 0.25

This is just a simple strategy in one extreme example that shows that the problem is not as simple as it first seems.

In general, the expectations with g green coins and b blue coins is given by a recurrence relation:

E(g, 0) = g
E(0, b) = 0
E(g, b) = max(0, g(E(g-1, b) + 1)/(b+g) + b(E(g, b-1) - 1)/(b+g))

The max in the final row occurs because if it's -EV to play, then you're better stopping.

These recurrence relations can be solved using dynamic programming in O(gb) time.

from fractions import Fraction as F

def gb(G, B):
    E = [[F(0, 1)] * (B+1) for _ in xrange(G+1)]
    for g in xrange(G+1):
        E[g][0] = F(g, 1)
    for b in xrange(1, B+1):
        for g in xrange(1, G+1):
            E[g][b] = max(0, (g * (E[g-1][b]+1) + b * (E[g][b-1]-1)) * F(1, (b+g)))
    for row in E:
        for v in row:
            print '%5.2f' % v,
        print
    print
    return E[G][B]

print gb(8, 10)

Output:

 0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
 1.00  0.50  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
 2.00  1.33  0.67  0.20  0.00  0.00  0.00  0.00  0.00  0.00  0.00
 3.00  2.25  1.50  0.85  0.34  0.00  0.00  0.00  0.00  0.00  0.00
 4.00  3.20  2.40  1.66  1.00  0.44  0.07  0.00  0.00  0.00  0.00
 5.00  4.17  3.33  2.54  1.79  1.12  0.55  0.15  0.00  0.00  0.00
 6.00  5.14  4.29  3.45  2.66  1.91  1.23  0.66  0.23  0.00  0.00
 7.00  6.12  5.25  4.39  3.56  2.76  2.01  1.34  0.75  0.30  0.00
 8.00  7.11  6.22  5.35  4.49  3.66  2.86  2.11  1.43  0.84  0.36

7793/21879

From this you can see that the expectation is positive to play with 8 green and 10 blue coins (EV=7793/21879 ~= 0.36), and you even have positive expectation with 2 green and 3 blue coins (EV=0.2)

Algorithm for optimal expected amount in a profit/loss game

Tags:

algorithm

probability

user2980096

1 Answers

Paul Hankin

Recent Activity

Donate For Us

Algorithm for optimal expected amount in a profit/loss game

Tags:

algorithm

probability

user2980096

1 Answers

Paul Hankin

Related questions

Recent Activity

Donate For Us