I am looking for a test in Python that does this:
> survivors <- matrix(c(1781,1443,135,47), ncol=2) > colnames(survivors) <- c('survived','died') > rownames(survivors) <- c('no seat belt','seat belt') > survivors survived died no seat belt 1781 135 seat belt 1443 47 > prop.test(survivors) 2-sample test for equality of proportions with continuity correction data: survivors X-squared = 24.3328, df = 1, p-value = 8.105e-07 alternative hypothesis: two.sided 95 percent confidence interval: -0.05400606 -0.02382527 sample estimates: prop 1 prop 2 0.9295407 0.9684564
I am mostly interested in p-value
calculation.
The example is taken form here
R functions: prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.)
proportion library to the python compiler and then call the proportions_ztest() function to simpling get the one proportional Z-test by adding the parameters to the function. proportions_ztest() function: This function is used to test for proportions based on the normal (z) test.
prop. test can be used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.
I think I got it:
In [11]: from scipy import stats In [12]: import numpy as np In [13]: survivors = np.array([[1781,135], [1443, 47]]) In [14]: stats.chi2_contingency(survivors) Out[14]: (24.332761232771361, # x-squared 8.1048817984512269e-07, # p-value 1, array([[ 1813.61832061, 102.38167939], [ 1410.38167939, 79.61832061]]))
Adding to @Akavall's answer: If you don't explicitly have the "failure" counts (# of deaths in your example), R's prop.test
lets you specify just the total number of trials e.g. prop.test(c(1781, 1443), c(1781+135, 1443+47))
would give you the same results as with the contingency table you built.
Scipy's chi2_contingency
explicitly ask for the failure counts and the complete contingency tables. If you don't explicitly have the failure counts and just want to check whether the proportion of successes out of the total are equal for two samples, you can hack around scipy's function with
survivors = np.array([[1781, total1 - 1781], [1443, total2 - 47]]) chi2_contingency(survivors) # Result: (24.332761232771361, 8.1048817984512269e-07, 1, array([[ 1813.61832061, 102.38167939], [ 1410.38167939, 79.61832061]]))
Took me some time to figure this one out. Hope it helps someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With