For 1,000,000 observations, I observed a discrete event, X, 3 times for the control group and 10 times for the test group.
I need to preform a Chi square test of independence in Matlab. This is how you would do it in r:
m <- rbind(c(3, 1000000-3), c(10, 1000000-10))
# [,1] [,2]
# [1,] 3 999997
# [2,] 10 999990
chisq.test(m)
The r function returns chi-squared = 2.7692, df = 1, p-value = 0.0961.
What Matlab function should I use or create to do this?
Here is my own implementation that I use:
function [hNull pValue X2] = ChiSquareTest(o, alpha)
%# CHISQUARETEST Pearson's Chi-Square test of independence
%#
%# @param o The Contignecy Table of the joint frequencies
%# of the two events (attributes)
%# @param alpha Significance level for the test
%# @return hNull hNull = 1: null hypothesis accepted (independent)
%# hNull = 0: null hypothesis rejected (dependent)
%# @return pValue The p-value of the test (the prob of obtaining
%# the observed frequencies under hNull)
%# @return X2 The value for the chi square statistic
%#
%# o: observed frequency
%# e: expected frequency
%# dof: degree of freedom
[r c] = size(o);
dof = (r-1)*(c-1);
%# e = (count(A=ai)*count(B=bi)) / N
e = sum(o,2)*sum(o,1) / sum(o(:));
%# [ sum_r [ sum_c ((o_ij-e_ij)^2/e_ij) ] ]
X2 = sum(sum( (o-e).^2 ./ e ));
%# p-value needed to reject hNull at the significance level with dof
pValue = 1 - chi2cdf(X2, dof);
hNull = (pValue > alpha);
%# X2 value needed to reject hNull at the significance level with dof
%#X2table = chi2inv(1-alpha, dof);
%#hNull = (X2table > X2);
end
And an example to illustrate:
t = [3 999997 ; 10 999990]
[hNull pVal X2] = ChiSquareTest(t, 0.05)
hNull =
1
pVal =
0.052203
X2 =
3.7693
Note that the results are different from yours because chisq.test
performs a correction by default, according to ?chisq.test
correct: a logical indicating whether to apply continuity correction when computing the test statistic for 2x2 tables: one half is subtracted from all |O - E| differences.
Alternatively if you have the actual observations of the two events in question, you can use the CROSSTAB function which computes the contingency table and return the Chi2 and p-value measures:
X = randi([1 2],[1000 1]);
Y = randi([1 2],[1000 1]);
[t X2 pVal] = crosstab(X,Y)
t =
229 247
257 267
X2 =
0.087581
pVal =
0.76728
the equivalent in R would be:
chisq.test(X, Y, correct = FALSE)
Note: Both (MATLAB) approaches above require the Statistics Toolbox
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With