I am quite new to Python as well as Statistics. I'm trying to apply the Chi Squared Test to determine whether previous success affects the level of change of a person (percentage wise, this does seem to be the case, but I wanted to see whether my results were statistically significant).
My question is: Did I do this correctly? My results say the p-value is 0.0, which means that there is a significant relationship between my variables (which is what I want of course...but 0 seems a little bit too perfect for a p-value, so I'm wondering whether I did it incorrectly coding wise).
Here's what I did:
import numpy as np
import pandas as pd
import scipy.stats as stats
d = {'Previously Successful' : pd.Series([129.3, 182.7, 312], index=['Yes - changed strategy', 'No', 'col_totals']),
'Previously Unsuccessful' : pd.Series([260.17, 711.83, 972], index=['Yes - changed strategy', 'No', 'col_totals']),
'row_totals' : pd.Series([(129.3+260.17), (182.7+711.83), (312+972)], index=['Yes - changed strategy', 'No', 'col_totals'])}
total_summarized = pd.DataFrame(d)
observed = total_summarized.ix[0:2,0:2]
Output: Observed
expected = np.outer(total_summarized["row_totals"][0:2],
total_summarized.ix["col_totals"][0:2])/1000
expected = pd.DataFrame(expected)
expected.columns = ["Previously Successful","Previously Unsuccessful"]
expected.index = ["Yes - changed strategy","No"]
chi_squared_stat = (((observed-expected)**2)/expected).sum().sum()
print(chi_squared_stat)
crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*
df = 8) # *
print("Critical value")
print(crit)
p_value = 1 - stats.chi2.cdf(x=chi_squared_stat, # Find the p-value
df=8)
print("P value")
print(p_value)
stats.chi2_contingency(observed= observed)
Output Statistics
It is used for data analysis in Python and developed by Wes McKinney in 2008. Our Tutorial provides all the basic and advanced concepts of Python Pandas, such as Numpy, Data operation and Time Series Pandas is defined as an open-source library that provides high-performance data manipulation in Python.
What problem does pandas solve?¶. Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.
Pandas makes it simple to do many of the time consuming, repetitive tasks associated with working with data, including: 1 Data cleansing 2 Data fill 3 Data normalization 4 Merges and joins 5 Data visualization 6 Statistical analysis 7 Data inspection 8 Loading and saving data 9 And much more More ...
A few corrections:
expected
array is not correct. You must divide by observed.sum().sum()
, which is 1284, not 1000.chi_squared_stat
does not include a continuity correction. (But it isn't necessarily wrong to not use it--that's a judgment call for the statistician.)All the calculations that you perform (expected matrix, statistics, degrees of freedom, p-value) are computed by chi2_contingency
:
In [65]: observed
Out[65]:
Previously Successful Previously Unsuccessful
Yes - changed strategy 129.3 260.17
No 182.7 711.83
In [66]: from scipy.stats import chi2_contingency
In [67]: chi2, p, dof, expected = chi2_contingency(observed)
In [68]: chi2
Out[68]: 23.383138325890453
In [69]: p
Out[69]: 1.3273696199438626e-06
In [70]: dof
Out[70]: 1
In [71]: expected
Out[71]:
array([[ 94.63757009, 294.83242991],
[ 217.36242991, 677.16757009]])
By default, chi2_contingency
uses a continuity correction when the contingency table is 2x2. If you prefer to not use the correction, you can disable it with the argument correction=False
:
In [73]: chi2, p, dof, expected = chi2_contingency(observed, correction=False)
In [74]: chi2
Out[74]: 24.072616672232893
In [75]: p
Out[75]: 9.2770200776879643e-07
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With