I've used the following code in <code>R</code> to determine how well observed values (20, 20, 0 and 0 for example) fit expected values/ratios (25% for each of the four cases, for example): <pre class="prettyprint"><code>> chisq.test(c(20,20,0,0), p=c(0.25, 0.25, 0.25, 0.25)) Chi-squared test for given probabilities data: c(20, 20, 0, 0) X-squared = 40, df = 3, p-value = 1.066e-08 </code></pre> How can I replicate this in Python? I've tried using the <code>chisquare</code> function from <code>scipy</code> but the results I obtained were very different; I'm not sure if this is even the correct function to use. I've searched through the <code>scipy</code> documentation, but it's quite daunting as it runs to 1000+ pages; the <code>numpy</code> documentation is almost 50% more than that.

<code>scipy.stats.chisquare</code> expects observed and expected absolute frequencies, not ratios. You can obtain what you want with <pre class="prettyprint"><code>>>> observed = np.array([20., 20., 0., 0.]) >>> expected = np.array([.25, .25, .25, .25]) * np.sum(observed) >>> chisquare(observed, expected) (40.0, 1.065509033425585e-08) </code></pre> Although in the case that the expected values are uniformly distributed over the classes, you can leave out the computation of the expected values: <pre class="prettyprint"><code>>>> chisquare(observed) (40.0, 1.065509033425585e-08) </code></pre> The first returned value is the χ² statistic, the second the p-value of the test.

Just wanted to point out that while the answer appears to be correct syntactically, you should not be using a Chi-squared distribution with your example because you have observed frequencies that are too small for an accurate Chi-square test. "This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5." see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html#scipy.stats.chisquare

Chi-Squared test in Python

Tags:

python

r

scipy

I've used the following code in R to determine how well observed values (20, 20, 0 and 0 for example) fit expected values/ratios (25% for each of the four cases, for example):

> chisq.test(c(20,20,0,0), p=c(0.25, 0.25, 0.25, 0.25))      Chi-squared test for given probabilities  data:  c(20, 20, 0, 0)  X-squared = 40, df = 3, p-value = 1.066e-08

How can I replicate this in Python? I've tried using the chisquare function from scipy but the results I obtained were very different; I'm not sure if this is even the correct function to use. I've searched through the scipy documentation, but it's quite daunting as it runs to 1000+ pages; the numpy documentation is almost 50% more than that.

537

asked Feb 17 '12 14:02

SabreWolfy

2 Answers

scipy.stats.chisquare expects observed and expected absolute frequencies, not ratios. You can obtain what you want with

>>> observed = np.array([20., 20., 0., 0.]) >>> expected = np.array([.25, .25, .25, .25]) * np.sum(observed) >>> chisquare(observed, expected) (40.0, 1.065509033425585e-08)

Although in the case that the expected values are uniformly distributed over the classes, you can leave out the computation of the expected values:

>>> chisquare(observed) (40.0, 1.065509033425585e-08)

The first returned value is the χ² statistic, the second the p-value of the test.

157

answered Sep 28 '22 03:09

Fred Foo

Just wanted to point out that while the answer appears to be correct syntactically, you should not be using a Chi-squared distribution with your example because you have observed frequencies that are too small for an accurate Chi-square test.

"This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5." see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html#scipy.stats.chisquare

answered Sep 28 '22 02:09

emaxwell

Related questions
                            
                                pygame installation issue in mac os
                            
                                Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?
                            
                                Python: What is the difference between math.exp and numpy.exp and why do numpy creators choose to introduce exp again
                            
                                sklearn LogisticRegression and changing the default threshold for classification
                            
                                Is there any way to clear django.db.connection.queries?
                            
                                Confused about backslashes in regular expressions [duplicate]
                            
                                How to export current notebook in HTML on Jupyter
                            
                                Matplotlib colorbar ticks on left/opposite side
                            
                                Attributes of Python module `this`
                            
                                How to decrease the density of x-ticks in seaborn
                            
                                Select non-null rows from a specific column in a DataFrame and take a sub-selection of other columns
                            
                                Pipenv vs setup.py
                            
                                Click and pylint
                            
                                How to create a list of dictionaries from a dictionary with lists of different lengths
                            
                                What are the advantages of packaging your python library/application as an .egg file?
                            
                                unicode() vs. str.decode() for a utf8 encoded byte string (python 2.x)
                            
                                Python: use regular expression to remove the white space from all lines
                            
                                Defining a model class in Django shell fails
                            
                                How to import 'GDB' in Python
                            
                                Python: Test if value can be converted to an int in a list comprehension

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With