Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A better chi-square test for Perl?

Let's say I roll a 6-sided die 60 times and I get 16, 5, 9, 7, 6, 15 roles for the numbers 1 through 6, respectively. The numbers 1 and 6 are showing up too much and there's only about a 1.8% chance of that being random. If I use Statistics::ChiSquare, it prints out:

There's a >1% chance, and a <5% chance, that this data is random.

So not only is it a bad interface (I can't get those numbers back directly), but the rounding error is significant.

What's worse, what if I'm rolling 2 six sided dice? The odds of getting any particular number are:

Sum Frequency   Relative Frequency 
2   1           1/36 
3   2           2/36                                                                                                                                                                                                               
4   3           3/36
5   4           4/36
6   5           5/36
7   6           6/36
8   5           5/36
9   4           4/36
10  3           3/36
11  2           2/36
12  1           1/36

Statistics::ChiSquare used to have a chisquare_nonuniform() function, but it was removed.

So the numbers are rounded poorly and I can't use it for a non-uniform distribution. Given a list of actual frequency and a list of expected frequency, what's the best way of calculating the chi-square test in Perl? The various modules I'm finding on the CPAN aren't helping me, so I'm guessing I missed something obvious.

like image 517
Ovid Avatar asked Jan 18 '14 13:01

Ovid


1 Answers

Implementing this yourself is so simple that I wouldn't want to upload Yet Another Statistics Module just for this.

use Carp qw< croak >;
use List::Util qw< sum >;
use Statistics::Distributions qw< chisqrprob >;

sub chi_squared_test {
  my %args = @_;
  my $observed = delete $args{observed} // croak q(Argument "observed" required);
  my $expected = delete $args{expected} // croak q(Argument "expected" required);
  @$observed == @$expected or croak q(Input arrays must have same length);

  my $chi_squared = sum map {
    ($observed->[$_] - $expected->[$_])**2 / $expected->[$_];
  } 0 .. $#$observed;
  my $degrees_of_freedom = @$observed - 1;
  my $probability = chisqrprob($degrees_of_freedom, $chi_squared);
  return $probability;
}

say chi_squared_test
  observed => [16, 5, 9, 7, 6, 17],
  expected => [(10) x 6];

Output: 0.018360

like image 161
amon Avatar answered Oct 20 '22 03:10

amon