Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matlab test of independence

For 1,000,000 observations, I observed a discrete event, X, 3 times for the control group and 10 times for the test group.

I need to preform a Chi square test of independence in Matlab. This is how you would do it in r:

m <- rbind(c(3, 1000000-3), c(10, 1000000-10))
#      [,1]   [,2] 
# [1,]    3 999997
# [2,]   10 999990
chisq.test(m)

The r function returns chi-squared = 2.7692, df = 1, p-value = 0.0961.

What Matlab function should I use or create to do this?

like image 317
Elpezmuerto Avatar asked Jul 28 '10 18:07

Elpezmuerto


1 Answers

Here is my own implementation that I use:

function [hNull pValue X2] = ChiSquareTest(o, alpha)
    %#  CHISQUARETEST  Pearson's Chi-Square test of independence
    %#
    %#    @param o          The Contignecy Table of the joint frequencies
    %#                      of the two events (attributes)
    %#    @param alpha      Significance level for the test
    %#    @return hNull     hNull = 1: null hypothesis accepted (independent)
    %#                      hNull = 0: null hypothesis rejected (dependent)
    %#    @return pValue    The p-value of the test (the prob of obtaining
    %#                      the observed frequencies under hNull)
    %#    @return X2        The value for the chi square statistic
    %#

    %# o:     observed frequency
    %# e:     expected frequency
    %# dof:   degree of freedom

    [r c] = size(o);
    dof = (r-1)*(c-1);

    %# e = (count(A=ai)*count(B=bi)) / N
    e = sum(o,2)*sum(o,1) / sum(o(:));

    %# [ sum_r [ sum_c ((o_ij-e_ij)^2/e_ij) ] ]
    X2 = sum(sum( (o-e).^2 ./ e ));

    %# p-value needed to reject hNull at the significance level with dof
    pValue = 1 - chi2cdf(X2, dof);
    hNull = (pValue > alpha);

    %# X2 value needed to reject hNull at the significance level with dof
    %#X2table = chi2inv(1-alpha, dof);
    %#hNull = (X2table > X2);

end

And an example to illustrate:

t = [3 999997 ; 10 999990]
[hNull pVal X2] = ChiSquareTest(t, 0.05)

hNull =
     1
pVal =
     0.052203
X2 =
       3.7693

Note that the results are different from yours because chisq.test performs a correction by default, according to ?chisq.test

correct: a logical indicating whether to apply continuity correction when computing the test statistic for 2x2 tables: one half is subtracted from all |O - E| differences.


Alternatively if you have the actual observations of the two events in question, you can use the CROSSTAB function which computes the contingency table and return the Chi2 and p-value measures:

X = randi([1 2],[1000 1]);
Y = randi([1 2],[1000 1]);
[t X2 pVal] = crosstab(X,Y)

t =
   229   247
   257   267
X2 =
     0.087581
pVal =
      0.76728

the equivalent in R would be:

chisq.test(X, Y, correct = FALSE)

Note: Both (MATLAB) approaches above require the Statistics Toolbox

like image 121
Amro Avatar answered Sep 28 '22 03:09

Amro