Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chi-Squared test in Python

Tags:

python

r

scipy

I've used the following code in R to determine how well observed values (20, 20, 0 and 0 for example) fit expected values/ratios (25% for each of the four cases, for example):

> chisq.test(c(20,20,0,0), p=c(0.25, 0.25, 0.25, 0.25))      Chi-squared test for given probabilities  data:  c(20, 20, 0, 0)  X-squared = 40, df = 3, p-value = 1.066e-08 

How can I replicate this in Python? I've tried using the chisquare function from scipy but the results I obtained were very different; I'm not sure if this is even the correct function to use. I've searched through the scipy documentation, but it's quite daunting as it runs to 1000+ pages; the numpy documentation is almost 50% more than that.

like image 537
SabreWolfy Avatar asked Feb 17 '12 14:02

SabreWolfy


People also ask

How do you do a chi-square test in pandas?

To run the Chi-Square Test, the easiest way is to convert the data into a contingency table with frequencies. We will use the crosstab command from pandas .

What is chi2_contingency in Python?

chi2_contingency(observed, correction=True, lambda_=None)[source] Chi-square test of independence of variables in a contingency table. This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table [1] observed.

What is chi-square test in machine learning?

A chi-square test is used in statistics to test the independence of two events. Given the data of two variables, we can get observed count O and expected count E. Chi-Square measures how expected count E and observed count O deviates each other.

How do you perform a chi square test in Python?

Step 1: Create the data. Step 2: Perform the Chi-Square Test of Independence. Next, we can perform the Chi-Square Test of Independence using the chi2_contingency function from the SciPy library, which uses the following syntax: observed: A contingency table of observed values.

What is a chi-square test of independence in Python?

A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables. This tutorial explains how to perform a Chi-Square Test of Independence in Python. Suppose we want to know whether or not gender is associated with political party preference.

What is Pearson’s chi-squared test in Python?

At the end, we want to compare our test result to the result we get with Python‘s built-in function. Pearson’s chi-squared test is a hypothesis test which is used to determine whether there is a significant association between two categorical variables in a contingency table.

How to run the chi-square test in pandas?

Let’s generate some sample data to work on it. To run the Chi-Square Test, the easiest way is to convert the data into a contingency table with frequencies. We will use the crosstab command from pandas.


2 Answers

scipy.stats.chisquare expects observed and expected absolute frequencies, not ratios. You can obtain what you want with

>>> observed = np.array([20., 20., 0., 0.]) >>> expected = np.array([.25, .25, .25, .25]) * np.sum(observed) >>> chisquare(observed, expected) (40.0, 1.065509033425585e-08) 

Although in the case that the expected values are uniformly distributed over the classes, you can leave out the computation of the expected values:

>>> chisquare(observed) (40.0, 1.065509033425585e-08) 

The first returned value is the χ² statistic, the second the p-value of the test.

like image 157
Fred Foo Avatar answered Sep 28 '22 03:09

Fred Foo


Just wanted to point out that while the answer appears to be correct syntactically, you should not be using a Chi-squared distribution with your example because you have observed frequencies that are too small for an accurate Chi-square test.

"This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5." see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html#scipy.stats.chisquare

like image 21
emaxwell Avatar answered Sep 28 '22 02:09

emaxwell