Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a pythonic way to do a contingency table in Pandas?

Tags:

Given a dataframe that looks like this:

            A   B       2005-09-06  5  -2   2005-09-07 -1   3   2005-09-08  4   5  2005-09-09 -8   2 2005-09-10 -2  -5 2005-09-11 -7   9  2005-09-12  2   8   2005-09-13  6  -5   2005-09-14  6  -5   

Is there a pythonic way to create a 2x2 matrix like this:

    1  0  1  a  b  0  c  d 

Where:

a = number of obs where the corresponding elements of column A and B are both positive.

b = number of obs where the corresponding elements of column A are positive and negative in column B.

c = number of obs where the corresponding elements of column A are negative and positive in column B.

d = number of obs where the corresponding elements of column A and B are both negative.

For this example the output would be:

    1  0  1  2  3  0  3  1 

Thanks

like image 388
hernanavella Avatar asked Apr 27 '15 16:04

hernanavella


People also ask

How do you make a contingency table in pandas?

pandas.crosstab(index, columns) where: index: name of variable to display in the rows of the contingency table. columns: name of variable to display in the columns of the contingency table.

How do you construct a contingency table for data?

Creating a basic contingency table. To create a contingency table of the data in the var1 column cross-classified with the data in the var2 column, choose the Stat > Tables > Contingency > With Data menu option. Select var1 as the Row variable, choose var2 as the Column variable, and click Compute!.


1 Answers

Probably easiest to just use the pandas function crosstab. Borrowing from Dyno Fu above:

import pandas as pd from StringIO import StringIO table = """dt          A   B 2005-09-06  5  -2 2005-09-07 -1   3 2005-09-08  4   5 2005-09-09 -8   2 2005-09-10 -2  -5 2005-09-11 -7   9 2005-09-12  2   8 2005-09-13  6  -5 2005-09-14  6  -5 """ sio = StringIO(table) df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt']) df.set_index("dt", inplace=True)  pd.crosstab(df.A > 0, df.B > 0) 

Output:

B      False  True  A                   False      1      3 True       3      2  [2 rows x 2 columns] 

Also the table is usable if you want to do a Fisher exact test with scipy.stats etc:

from scipy.stats import fisher_exact tab = pd.crosstab(df.A > 0, df.B > 0) fisher_exact(tab) 
like image 65
Tyr Wiesner-Hanks Avatar answered Oct 28 '22 22:10

Tyr Wiesner-Hanks