I have a dataframe as follows:
d = {
'id': [1, 2, 3, 4, 5],
'is_overdue': [True, False, True, True, False],
'org': ['A81001', 'A81002', 'A81001', 'A81002', 'A81003']
}
df = pd.DataFrame(data=d)
Now I want to work out for each organisation, what percentage of rows are overdue, and what percentage are not.
I know how to group by organisation and overdue status:
df.groupby(['org', 'is_overdue']).agg('count')
But how do I get the proportion by organisation? I want to end up with something like this:
org is_overdue not_overdue proportion_overdue
A81001 2 0 100
A81002 1 1 50
A81003 0 1 0
You could use DataFrame.apply
.
First group by the organizations and count the number of overdue/non-overdue. Then calculate the percentage.
df_overdue = df.groupby(['org']).apply(lambda dft: pd.Series({'is_overdue': dft.is_overdue.sum(), 'not_overdue': (~dft.is_overdue).sum()}))
df_overdue['proportion_overdue'] = df_overdue['is_overdue'] / (df_overdue['not_overdue'] + df_overdue['is_overdue'])
print(df_overdue)
outputs
is_overdue not_overdue proportion_overdue
org
A81001 2 0 1.0
A81002 1 1 0.5
A81003 0 1 0.0
You could use pd.crosstab
to create a frequency table -- i.e. to count the number of is_overdue
s for each org
.
import pandas as pd
d = {
'id': [1, 2, 3, 4, 5],
'is_overdue': [True, False, True, True, False],
'org': ['A81001', 'A81002', 'A81001', 'A81002', 'A81003']
}
df = pd.DataFrame(data=d)
result = pd.crosstab(index=df['org'], columns=df['is_overdue'], margins=True)
result = result.rename(columns={True:'is_overdue', False:'not overdue'})
result['proportion'] = result['is_overdue']/result['All']*100
print(result)
yields
is_overdue not overdue is_overdue All proportion
org
A81001 0 2 2 100.0
A81002 1 1 2 50.0
A81003 1 0 1 0.0
All 2 3 5 60.0
There are more efficient ways to do this but since you were trying to use aggregate functions initially, this is the way to solve your problem using aggregate functions:
df.is_overdue = df.is_overdue.map({True: 1, False: 0})
df.groupby(['org'])['is_overdue'].agg({'total_count':'count', 'is_overdue': 'sum'}).reset_index()
Now you can just calculate not_overdue and proportion_overdue very easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With