Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create contingency table Pandas with counts and percentages

Is there a better way to create a contingency table in pandas with pd.crosstab() or pd.pivot_table() to generate counts and percentages.

Current solution

cat=['A','B','B','A','B','B','A','A','B','B']
target = [True,False,False,False,True,True,False,True,True,True]

import pandas as pd
df=pd.DataFrame({'cat' :cat,'target':target})

using crosstab

totals=pd.crosstab(df['cat'],df['target'],margins=True).reset_index()
percentages = pd.crosstab(df['cat'],
   df['target']).apply(lambda row: row/row.sum(),axis=1).reset_index()

and a merge

summaryTable=pd.merge(totals,percentages,on="cat")
summaryTable.columns=['cat','#False',
    '#True','All','percentTrue','percentFalse']

output

+---+-----+--------+-------+-----+-------------+--------------+
|   | cat | #False | #True | All | percentTrue | percentFalse |
+---+-----+--------+-------+-----+-------------+--------------+
| 0 | A   |      2 |     2 |   4 | 0.500000    | 0.500000     |
| 1 | B   |      2 |     4 |   6 | 0.333333    | 0.666667     |
+---+-----+--------+-------+-----+-------------+--------------+
like image 681
iboboboru Avatar asked Mar 16 '16 18:03

iboboboru


People also ask

Can you use percentages in a contingency table?

Row and Column Percentages in Contingency TablesUse percentages to adjust for unequal group sizes. Percentages are relative frequencies. Learn more about Relative Frequencies and their Distributions.

How do you get percentages on crosstab pandas?

In order to do so, we'll first convert the crosstab contents to a Python string data type. We do that by using the astype DataFrame method. We then append the % character using a simple lambda function.

How do you construct a contingency table for data?

To create a contingency table of the data in the var1 column cross-classified with the data in the var2 column, choose the Stat > Tables > Contingency > With Data menu option. Select var1 as the Row variable, choose var2 as the Column variable, and click Compute!.


1 Answers

you can do the following:

In [131]: s = df.groupby('cat').agg({'target': ['sum', 'count']}).reset_index(level=0)

In [132]: s.columns
Out[132]:
MultiIndex(levels=[['target', 'cat'], ['sum', 'count', '']],
           labels=[[1, 0, 0], [2, 0, 1]])

Let's bring order to column names:

In [133]: s.columns = [col[1] if col[1] else col[0] for col in s.columns.tolist()]

In [134]: s
Out[134]:
  cat  sum  count
0   A  2.0      4
1   B  4.0      6

In [135]: s['pctTrue'] = s['sum']/s['count']

In [136]: s['pctFalse'] = 1 - s.pctTrue

In [137]: s
Out[137]:
  cat  sum  count   pctTrue  pctFalse
0   A  2.0      4  0.500000  0.500000
1   B  4.0      6  0.666667  0.333333
like image 120
MaxU - stop WAR against UA Avatar answered Oct 26 '22 23:10

MaxU - stop WAR against UA