Is there a better way to create a contingency table in pandas with pd.crosstab() or pd.pivot_table() to generate counts and percentages.
Current solution
cat=['A','B','B','A','B','B','A','A','B','B']
target = [True,False,False,False,True,True,False,True,True,True]
import pandas as pd
df=pd.DataFrame({'cat' :cat,'target':target})
using crosstab
totals=pd.crosstab(df['cat'],df['target'],margins=True).reset_index()
percentages = pd.crosstab(df['cat'],
df['target']).apply(lambda row: row/row.sum(),axis=1).reset_index()
and a merge
summaryTable=pd.merge(totals,percentages,on="cat")
summaryTable.columns=['cat','#False',
'#True','All','percentTrue','percentFalse']
output
+---+-----+--------+-------+-----+-------------+--------------+
| | cat | #False | #True | All | percentTrue | percentFalse |
+---+-----+--------+-------+-----+-------------+--------------+
| 0 | A | 2 | 2 | 4 | 0.500000 | 0.500000 |
| 1 | B | 2 | 4 | 6 | 0.333333 | 0.666667 |
+---+-----+--------+-------+-----+-------------+--------------+
Row and Column Percentages in Contingency TablesUse percentages to adjust for unequal group sizes. Percentages are relative frequencies. Learn more about Relative Frequencies and their Distributions.
In order to do so, we'll first convert the crosstab contents to a Python string data type. We do that by using the astype DataFrame method. We then append the % character using a simple lambda function.
To create a contingency table of the data in the var1 column cross-classified with the data in the var2 column, choose the Stat > Tables > Contingency > With Data menu option. Select var1 as the Row variable, choose var2 as the Column variable, and click Compute!.
you can do the following:
In [131]: s = df.groupby('cat').agg({'target': ['sum', 'count']}).reset_index(level=0)
In [132]: s.columns
Out[132]:
MultiIndex(levels=[['target', 'cat'], ['sum', 'count', '']],
labels=[[1, 0, 0], [2, 0, 1]])
Let's bring order to column names:
In [133]: s.columns = [col[1] if col[1] else col[0] for col in s.columns.tolist()]
In [134]: s
Out[134]:
cat sum count
0 A 2.0 4
1 B 4.0 6
In [135]: s['pctTrue'] = s['sum']/s['count']
In [136]: s['pctFalse'] = 1 - s.pctTrue
In [137]: s
Out[137]:
cat sum count pctTrue pctFalse
0 A 2.0 4 0.500000 0.500000
1 B 4.0 6 0.666667 0.333333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With