Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A Better Way to Calculate Odd Ratio in Pandas

Tags:

python

pandas

I have a dataframe counts1 which looks like:

Factor            w-statin  wo-statin
AgeGroups Cancer                     
0-5       No           108       6575
          Yes            0        223
11-15     No             5       3669
          Yes            1        143
16-20     No            28       6174
          Yes            1        395
21-25     No            80       8173
          Yes            2        624
26-30     No           110       9143
          Yes            2        968
30-35     No           171       9046
          Yes            5       1225
35-40     No           338       8883
          Yes           21       1475

I wanted to calculate the oddsratio (w-statin/wo-statin). I did it old style like I would do it in paper:

counts1['sumwwoStatin']= counts1['w-statin']+counts1['wo-statin']

counts1['oddRatio']=((counts1['w-statin']/counts1['sumwwoStatin'])/(counts1['wo-statin']/counts1['sumwwoStatin']))

Is there a better way to calculate Odds-ratio, Relative risk, Contigency Table, & Chi-Square Tests in Pandas, just like in R? Any suggestions are appreciated. Oh by the way, I forgot to mention how my csv looks like:

    Frequency Cancer     Factor AgeGroups
0         223    Yes  wo-statin       0-5
1         112    Yes  wo-statin      6-10
2         143    Yes  wo-statin     11-15
3         395    Yes  wo-statin     16-20
4         624    Yes  wo-statin     21-25
5         968    Yes  wo-statin     26-30
6        1225    Yes  wo-statin     30-35
7        1475    Yes  wo-statin     35-40
8        2533    Yes  wo-statin     41-45
9        4268    Yes  wo-statin     46-50
10       5631    Yes  wo-statin     52-55
11       6656    Yes  wo-statin     56-60
12       7166    Yes  wo-statin     61-65
13       8573    Yes  wo-statin     66-70
14       8218    Yes  wo-statin     71-75
15       4614    Yes  wo-statin     76-80
16       1869    Yes  wo-statin     81-85
17        699    Yes  wo-statin     86-90
18        157    Yes  wo-statin     91-95
19         31    Yes  wo-statin    96-100
20          5    Yes  wo-statin      >100
21        108     No   w-statin       0-5
22          6     No   w-statin      6-10
23          5     No   w-statin     11-15
24         28     No   w-statin     16-20
25         80     No   w-statin     21-25
26        110     No   w-statin     26-30
27        171     No   w-statin     30-35
28        338     No   w-statin     35-40
29        782     No   w-statin     41-45
..
like image 423
Acerace.py Avatar asked Apr 06 '17 17:04

Acerace.py


Video Answer


2 Answers

AFAIK pandas does not provide statistical computations and tests except basic moments like mean, variance, correlations etc...

However, you can rely on scipy for this requirement. You'll find most of what you need there. For instance, to calculate the odds ratio:

import scipy.stats as stats

table = df.groupby(level="Cancer").sum().values
print(table)

>>> array([[  840, 51663],
           [   32,  5053]])

oddsratio, pvalue = stats.fisher_exact(table)
print("OddsR: ", oddsratio, "p-Value:", pvalue)

>>> OddsR:  2.56743220487 p-Value: 2.72418938361e-09

See here and here for more.

like image 143
pansen Avatar answered Sep 18 '22 05:09

pansen


statsmodels can be used to estimate the odds ration if both variables are already part of the same pandas data frame:

import statsmodels.api as sm

table = sm.stats.Table.from_data(df[['w-statin','wo-statin']])
rslt = table.test_nominal_association()
print(table.local_oddsratios)
print( rslt.pvalue )
like image 20
Julio Cárdenas-Rodríguez Avatar answered Sep 22 '22 05:09

Julio Cárdenas-Rodríguez