I have a dataframe as follows:
| people | statusName |
| -------------------- | ----------- |
| [Steve] | To Do |
| [Jill, John] | To Do |
| [Jill, John] | To Do |
| [Jill, John] | Completed |
| [Amanda, John] | To Do |
| [Meryll, Jill, John] | To Do |
| [Meryll, Jill, John] | In Progress |
| [Meryll, Bill] | Completed |
| [John, Tim] | To Do |
| [John, Tim] | To Do |
| [John, Tim] | Assigned |
| [John, Tom] | In Progress |
So the first column is a list type. I want to sort them according to the different statusName for each person. So the desired dataframe is as follows:
| people | Total | To Do | In Progress | Completed | Stopped |
|--------|-------|-------|-------------|-----------|---------|
| Steve | 1 | 1 | 0 | 0 | 0 |
| Jill | 5 | 3 | 1 | 1 | 0 |
| John | 6 | 4 | 1 | 1 | 0 |
| Amanda | 1 | 1 | 0 | 0 | 0 |
| Meryll | 3 | 1 | 1 | 1 | 0 |
| Bill | 1 | 0 | 0 | 1 | 0 |
| Tim | 3 | 2 | 0 | 0 | 1 |
| Tom | 1 | 0 | 1 | 0 | 0 |
So basically what I want is how crosstab function works when the people column is a string and not a list type of different people names.
How can I achieve the same using dataframe? Or whichever method is applicable in this case?
Dataframe:
df = pd.DataFrame({'people': {0: ['Steve'],
1: ['Jill', 'John'],
2: ['Jill', 'John'],
3: ['Jill', 'John'],
4: ['Amanda', 'John'],
5: ['Meryll', 'Jill', 'John'],
6: ['Meryll', 'Jill', 'John'],
7: ['Meryll', 'Bill'],
8: ['John', 'Tim'],
9: ['John', 'Tim'],
10: ['John', 'Tim'],
11: ['John', 'Tom']},
'statusName': {0: 'To Do',
1: 'To Do',
2: 'To Do',
3: 'Completed',
4: 'To Do',
5: 'To Do',
6: 'In Progress',
7: 'Completed',
8: 'To Do',
9: 'To Do',
10: 'Assigned',
11: 'In Progress'}})
You can explode people column and crosstab: First, change 'people' column to list, as below:
df['people']=df['people'].str.replace('[', '').str.replace(']', '')
df['people']=df['people'].str.split(',')
Now apply the explode and crosstab function:
df2=df.explode('people')
res = pd.crosstab(df2['people'], df2['statusName'])
res['Total']=res['Assigned'] + res['Completed'] + res['In Progress'] + res['To Do']
res=res[['Total', 'Assigned', 'Completed', 'In Progress', 'To Do']]
res.columns.name = None
res.reset_index(inplace=True)
>>> print(res)
people Total Assigned Completed In Progress To Do
0 Amanda 1 0 0 0 1
1 Bill 1 0 1 0 0
2 Jill 5 0 1 1 3
3 John 10 1 1 2 6
4 Meryll 3 0 1 1 1
5 Steve 1 0 0 0 1
6 Tim 3 1 0 0 2
7 Tom 1 0 0 1 0
Use, explode and get_dummies, then groupby and sum:
df_out = (
pd.get_dummies(df.explode("people").set_index("people"), prefix="", prefix_sep="")
.groupby(level=0)
.sum()
)
df_out["Total"] = df_out.sum(axis=1)
df_out.reset_index()
Output:
people Assigned Completed In Progress To Do Total
0 Amanda 0 0 0 1 1
1 Bill 0 1 0 0 1
2 Jill 0 1 1 3 5
3 John 1 1 2 6 10
4 Meryll 0 1 1 1 3
5 Steve 0 0 0 1 1
6 Tim 1 0 0 2 3
7 Tom 0 0 1 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With