Let's say I roll a 6-sided die 60 times and I get 16, 5, 9, 7, 6, 15 roles for the numbers 1 through 6, respectively. The numbers 1 and 6 are showing up too much and there's only about a 1.8% chance of that being random. If I use Statistics::ChiSquare, it prints out:
There's a >1% chance, and a <5% chance, that this data is random.
So not only is it a bad interface (I can't get those numbers back directly), but the rounding error is significant.
What's worse, what if I'm rolling 2 six sided dice? The odds of getting any particular number are:
Sum Frequency Relative Frequency
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Statistics::ChiSquare used to have a chisquare_nonuniform() function, but it was removed.
So the numbers are rounded poorly and I can't use it for a non-uniform distribution. Given a list of actual frequency and a list of expected frequency, what's the best way of calculating the chi-square test in Perl? The various modules I'm finding on the CPAN aren't helping me, so I'm guessing I missed something obvious.
Implementing this yourself is so simple that I wouldn't want to upload Yet Another Statistics Module just for this.
use Carp qw< croak >;
use List::Util qw< sum >;
use Statistics::Distributions qw< chisqrprob >;
sub chi_squared_test {
my %args = @_;
my $observed = delete $args{observed} // croak q(Argument "observed" required);
my $expected = delete $args{expected} // croak q(Argument "expected" required);
@$observed == @$expected or croak q(Input arrays must have same length);
my $chi_squared = sum map {
($observed->[$_] - $expected->[$_])**2 / $expected->[$_];
} 0 .. $#$observed;
my $degrees_of_freedom = @$observed - 1;
my $probability = chisqrprob($degrees_of_freedom, $chi_squared);
return $probability;
}
say chi_squared_test
observed => [16, 5, 9, 7, 6, 17],
expected => [(10) x 6];
Output: 0.018360
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With