Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Proportion test similar to prop.test in R

Tags:

I am looking for a test in Python that does this:

> survivors <- matrix(c(1781,1443,135,47), ncol=2) > colnames(survivors) <- c('survived','died') > rownames(survivors) <- c('no seat belt','seat belt') > survivors              survived died no seat belt     1781  135 seat belt        1443   47 > prop.test(survivors)      2-sample test for equality of proportions with continuity correction  data:  survivors X-squared = 24.3328, df = 1, p-value = 8.105e-07 alternative hypothesis: two.sided 95 percent confidence interval:  -0.05400606 -0.02382527 sample estimates:    prop 1    prop 2  0.9295407 0.9684564  

I am mostly interested in p-value calculation.

The example is taken form here

like image 954
Akavall Avatar asked Oct 28 '14 17:10

Akavall


People also ask

Which function is used to test for proportions in R?

R functions: prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.)

How do you do a proportion test in Python?

proportion library to the python compiler and then call the proportions_ztest() function to simpling get the one proportional Z-test by adding the parameters to the function. proportions_ztest() function: This function is used to test for proportions based on the normal (z) test.

What does Prop test mean in R?

prop. test can be used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.


2 Answers

I think I got it:

In [11]: from scipy import stats  In [12]: import numpy as np  In [13]: survivors = np.array([[1781,135], [1443, 47]])  In [14]: stats.chi2_contingency(survivors) Out[14]:  (24.332761232771361,       # x-squared  8.1048817984512269e-07,   # p-value  1,  array([[ 1813.61832061,   102.38167939],        [ 1410.38167939,    79.61832061]])) 
like image 121
Akavall Avatar answered Oct 07 '22 14:10

Akavall


Adding to @Akavall's answer: If you don't explicitly have the "failure" counts (# of deaths in your example), R's prop.test lets you specify just the total number of trials e.g. prop.test(c(1781, 1443), c(1781+135, 1443+47)) would give you the same results as with the contingency table you built.

Scipy's chi2_contingency explicitly ask for the failure counts and the complete contingency tables. If you don't explicitly have the failure counts and just want to check whether the proportion of successes out of the total are equal for two samples, you can hack around scipy's function with

survivors = np.array([[1781, total1 - 1781], [1443, total2 - 47]]) chi2_contingency(survivors)  # Result: (24.332761232771361, 8.1048817984512269e-07, 1, array([[ 1813.61832061,   102.38167939],            [ 1410.38167939,    79.61832061]])) 

Took me some time to figure this one out. Hope it helps someone.

like image 34
Ali Avatar answered Oct 07 '22 15:10

Ali