Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Chi 2 test produces wrong results (chi2_contingency)

I am trying to calculate the Chi square value in python, using a contingency table. Here is an example.

+--------+------+------+
|        | Cat1 | Cat2 |
+--------+------+------+
| Group1 |   80 |  120 |
| Group2 |  420 |  380 |
+--------+------+------+

The expected values are:

+--------+------+------+
|        | Cat1 | Cat2 |
+--------+------+------+
| Group1 |  100 |  100 |
| Group2 |  400 |  400 |
+--------+------+------+

If I calculate the Chi square value by hand I get 10. With python however I get 9.506. I use the following code:

import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy

# Some fake data.
n = 5  # Number of samples.
d = 3  # Dimensionality.
c = 2  # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])

# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])

contingency.iloc[0][0]=80
contingency.iloc[0][1]=120
contingency.iloc[1][0]=420
contingency.iloc[1][1]=380

# Chi-square test of independence.
chi, p, dof, expected = chi2_contingency(contingency)

It is weird that the function gives me the correct expected values, however the Chi square and p-value are off. What am I doing wrong here?

Thanks

p.s.

I am aware that I create the initial table in pandas is pretty lame, but I am not an expert on how to create these nested tables in pandas.

like image 647
valenzio Avatar asked Mar 08 '23 10:03

valenzio


1 Answers

From the documentation:

correction : bool, optional
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity.
The effect of the correction is to adjust each observed value by 0.5 towards
the corresponding expected value.

And degrees of freedom is 1. Is you set correction to False, you'll get 10.

chi2_contingency(contingency, correction=False)
>>> (10.0, 0.001565402258002549, 1, array([[ 100.,  100.],
    [ 400.,  400.]]))
like image 99
Andrey Lukyanenko Avatar answered Mar 24 '23 19:03

Andrey Lukyanenko