Sort by one column and count values of another column of a dataframe based on a column containing list objects in Pandas

Question

I have a dataframe as follows:

| people               | statusName  |
| -------------------- | ----------- |
| [Steve]              | To Do       |
| [Jill, John]         | To Do       |
| [Jill, John]         | To Do       |
| [Jill, John]         | Completed   |
| [Amanda, John]       | To Do       |
| [Meryll, Jill, John] | To Do       |
| [Meryll, Jill, John] | In Progress |
| [Meryll, Bill]       | Completed   |
| [John, Tim]          | To Do       |
| [John, Tim]          | To Do       |
| [John, Tim]          | Assigned    |
| [John, Tom]          | In Progress |

So the first column is a list type. I want to sort them according to the different statusName for each person. So the desired dataframe is as follows:


| people | Total | To Do | In Progress | Completed | Stopped |
|--------|-------|-------|-------------|-----------|---------|
| Steve  |     1 |     1 |           0 |         0 |       0 |
| Jill   |     5 |     3 |           1 |         1 |       0 |
| John   |     6 |     4 |           1 |         1 |       0 |
| Amanda |     1 |     1 |           0 |         0 |       0 |
| Meryll |     3 |     1 |           1 |         1 |       0 |
| Bill   |     1 |     0 |           0 |         1 |       0 |
| Tim    |     3 |     2 |           0 |         0 |       1 |
| Tom    |     1 |     0 |           1 |         0 |       0 |

So basically what I want is how crosstab function works when the people column is a string and not a list type of different people names.

How can I achieve the same using dataframe? Or whichever method is applicable in this case?

Dataframe:

df = pd.DataFrame({'people': {0: ['Steve'],
  1: ['Jill', 'John'],
  2: ['Jill', 'John'],
  3: ['Jill', 'John'],
  4: ['Amanda', 'John'],
  5: ['Meryll', 'Jill', 'John'],
  6: ['Meryll', 'Jill', 'John'],
  7: ['Meryll', 'Bill'],
  8: ['John', 'Tim'],
  9: ['John', 'Tim'],
  10: ['John', 'Tim'],
  11: ['John', 'Tom']},
 'statusName': {0: 'To Do',
  1: 'To Do',
  2: 'To Do',
  3: 'Completed',
  4: 'To Do',
  5: 'To Do',
  6: 'In Progress',
  7: 'Completed',
  8: 'To Do',
  9: 'To Do',
  10: 'Assigned',
  11: 'In Progress'}})

IoaTzimas · Accepted Answer

You can explode people column and crosstab: First, change 'people' column to list, as below:

df['people']=df['people'].str.replace('[', '').str.replace(']', '')
df['people']=df['people'].str.split(',')

Now apply the explode and crosstab function:

df2=df.explode('people')
res = pd.crosstab(df2['people'], df2['statusName'])
res['Total']=res['Assigned'] + res['Completed'] + res['In Progress'] + res['To Do']
    
res=res[['Total', 'Assigned', 'Completed', 'In Progress', 'To Do']]
res.columns.name = None
res.reset_index(inplace=True)

>>> print(res)
   people  Total  Assigned  Completed  In Progress  To Do
0  Amanda      1         0          0            0      1
1    Bill      1         0          1            0      0
2    Jill      5         0          1            1      3
3    John     10         1          1            2      6
4  Meryll      3         0          1            1      1
5   Steve      1         0          0            0      1
6     Tim      3         1          0            0      2
7     Tom      1         0          0            1      0

Scott Boston · Answer

Use, explode and get_dummies, then groupby and sum:

df_out = (
    pd.get_dummies(df.explode("people").set_index("people"), prefix="", prefix_sep="")
    .groupby(level=0)
    .sum()
)
df_out["Total"] = df_out.sum(axis=1)
df_out.reset_index()

Output:

   people  Assigned  Completed  In Progress  To Do  Total
0  Amanda         0          0            0      1      1
1    Bill         0          1            0      0      1
2    Jill         0          1            1      3      5
3    John         1          1            2      6     10
4  Meryll         0          1            1      1      3
5   Steve         0          0            0      1      1
6     Tim         1          0            0      2      3
7     Tom         0          0            1      0      1

Sort by one column and count values of another column of a dataframe based on a column containing list objects in Pandas

Tags:

python

pandas

dataframe

user001

2 Answers

IoaTzimas

Scott Boston

Recent Activity

Donate For Us

Sort by one column and count values of another column of a dataframe based on a column containing list objects in Pandas

Tags:

python

pandas

dataframe

user001

2 Answers

IoaTzimas

Scott Boston

Related questions

Recent Activity

Donate For Us