I've checked the examples in the Boost website, but they are not what I'm looking for.
To put it simple, I want to see if a number on a die is favored, using 600 rolls, so the average appearances of every number (1 through 6) should be 100.
And I want to use the chi square distribution to check if the die is fair.
Help!, how would I do this please ??
Suppose e[i] and o[i] are arrays holding the expected and observed count of rolls for each of the 6 possibilities. In your case, e[i] is 100 for each bin, and o[i] is the number of times i was rolled in your 600 trials.
You then calculate the chi-squared statistic by summing (e[i]-o[i])2/e[i] over the 6 bins. Lets say your o[i] array came out with 105, 95, 102, 98, 98, and 102 counts after doing your 600 trials.
chi2 = 52/100 + 52/100 + 22/100 + 22/100 + 22/100 + 22/100 = .660
You have five degrees of freedom (number of bins minus 1). So you're going to have a declaration like
boost::math::chi_squared mydist(5);
to create the Boost object representing your chi-square distribution.
At this point you would use the cdf
accessor function (cumulative distribution function) from the Boost library to look up the p-value corresponding to a chi-squared score of .660 with five degrees of freedom.
p = boost::math::cdf(mydist,.660);
You should get something close to 0.015, which would be interpreted as a (1 - .015) = 98.5% probability of observing a chi-squared score at least as extreme as 0.660, if one assumes the null hypothesis (that the die is fair) holds. So for this set of data, the null hypothesis cannot be rejected with any reasonable confidence level. (Disclaimer: untested code! But if I understand the Boost documentation correctly, this is how it should work.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With