Basically, the problem simulates the following:
There is an urn with 50 green balls and 50 red balls.
I am allowed to pick balls from the urn, without replacement, with the following rules: For every red ball picked, I lose a dollar, for every green ball picked, I gain a dollar.
I can stop picking whenever I want. Worst case scenario is I pick all 100, and net 0.
The question is to come up with an optimal stopping strategy, and create a program to compute the expected value of the strategy.
My strategy is to continue picking balls, while the expected value of picking another ball is positive.
That is, the stopping rule is DYNAMIC.
In Latex, here's the recursive formula in an image:
http://i.stack.imgur.com/fnzYk.jpg
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
double ExpectedValue(double, double);
double max(double, double);
main() {
double g = 50;
double r = 50;
double EV = ExpectedValue(g, r);
printf ("%f\n\n", EV);
system("PAUSE");
}
double ExpectedValue(double g, double r){
double p = (g / (g + r));
double q = 1 - p;
if (g == 0)
return r;
if (r == 0)
return 0;
double E_gr = max ((p * ExpectedValue (g - 1, r)) + (q * ExpectedValue (g, r - 1)), (r - g));
return E_gr;
}
double max(double a, double b){
if (a > b)
return a;
else return b;
}
I let it run for 30 minutes, and it was still working. For small values of g and r, a solution is computed very quickly. What am I doing wrong?
Any help is much appreciated!
Your algorithm is fine, but you are wasting information. For a certain pair (g, r)
you calculate it's ExpectedValue and then you throw that information away. Often with recursion algorithms remembering previously calculated values can speed it up a LOT.
The following code runs in the blink of an eye. For example for g = r = 5000
it calculates 36.900218
in 1 sec. It remembers previous calculations of ExpectedValue(g, r)
to prevent unnecessary recursion and recalculation.
#include <stdio.h>
#include <stdlib.h>
double ExpectedValue(int g, int r, double ***expectedvalues);
inline double max(double, double);
int main(int argc, char *argv[]) {
int g = 50;
int r = 50;
int i, j;
double **expectedvalues = malloc(sizeof(double*) * (g+1));
// initialise
for (i = 0; i < (g+1); i++) {
expectedvalues[i] = malloc(sizeof(double) * (r+1));
for (j = 0; j < (r+1); j++) {
expectedvalues[i][j] = -1.0;
}
}
double EV = ExpectedValue(g, r, &expectedvalues);
printf("%f\n\n", EV);
// free memory
for (i = 0; i < (g+1); i++) free(expectedvalues[i]);
free(expectedvalues);
return 0;
}
double ExpectedValue(int g, int r, double ***expectedvalues) {
if (g == 0) return r;
if (r == 0) return 0;
// did we calculate this before? If yes, then return that value
if ((*expectedvalues)[g][r] != -1.0) return (*expectedvalues)[g][r];
double p = (double) g / (g + r);
double E_gr = max(p * ExpectedValue(g-1, r, expectedvalues) + (1.0-p) * ExpectedValue(g, r-1, expectedvalues), (double) (r-g));
// store value for later lookup
(*expectedvalues)[g][r] = E_gr;
return E_gr;
}
double max(double a, double b) {
if (a > b) return a;
else return b;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With