Pandas- pivoting column into (conditional) aggregated string

Question

Lets say I have the following data set, turned into a dataframe:

data = [
    ['Job 1', datetime.date(2019, 6, 9), 'Jim', 'Tom'],
    ['Job 1', datetime.date(2019, 6, 9), 'Bill', 'Tom'],
    ['Job 1', datetime.date(2019, 6, 9), 'Tom', 'Tom'],
    ['Job 1', datetime.date(2019, 6, 10), 'Bill', None],
    ['Job 2', datetime.date(2019,6,10), 'Tom', 'Tom']
]
df = pd.DataFrame(data, columns=['Job', 'Date', 'Employee', 'Manager'])

This yields a dataframe that looks like:

     Job        Date Employee Manager
0  Job 1  2019-06-09      Jim     Tom
1  Job 1  2019-06-09     Bill     Tom
2  Job 1  2019-06-09      Tom     Tom
3  Job 1  2019-06-10     Bill    None
4  Job 2  2019-06-10      Tom     Tom

What I am trying to generate is a pivot on each unique Job/Date combo, with a column for Manager, and a column for a string with comma separated, non-manager employees. A couple of things to assume:

All employee names are unique (I'll actually be using unique employee ids rather than names), and Managers are also "employees", so there will never be a case with an employee and a manager sharing the same name/id, but being different individuals.
A work crew can have a manager, or not (see row with id 3, for an example without)
A manager will always also be listed as an employee (see row with id 2 or 4)
A job could have a manager, with no additional employees (see row id 4)

I'd like the resulting dataframe to look like:

     Job        Date  Manager     Employees
0  Job 1  2019-06-09      Tom     Jim, Bill
1  Job 1  2019-06-10     None          Bill
2  Job 2  2019-06-10      Tom          None

Which leads to my questions:

Is there a way to do a ','.join like aggregation in a pandas pivot?
Is there a way to make this aggregation conditional (exclude the name/id in the manager column)

I suspect 1) is possible, and 2) might be more difficult. If 2) is a no, I can get around it in other ways later in my code.

user3483203 · Accepted Answer

The tricky part here is removing the Manager from the Employee column.

u = df.melt(['Job', 'Date'])
f = u[~u.duplicated(['Job', 'Date', 'value'], keep='last')].astype(str)

f.pivot_table(
    index=['Job', 'Date'],
    columns='variable', values='value',
    aggfunc=','.join
).rename_axis(None, axis=1)

                  Employee Manager
Job   Date
Job 1 2019-06-09  Jim,Bill     Tom
      2019-06-10      Bill    None
Job 2 2019-06-10       NaN     Tom

ALollz · Answer

Group to aggregate, then fix the Employees by removing the Manager and setting to None where appropriate. Since the employees are unique, sets will work nicely here to remove the Manager.

s = df.groupby(['Job', 'Date']).agg({'Manager': 'first', 'Employee': lambda x: set(x)})
s['Employee'] = [', '.join(x.difference({y})) for x,y in zip(s.Employee, s.Manager)]
s['Employee'] = s.Employee.replace({'': None})

                 Manager   Employee
Job   Date                         
Job 1 2019-06-09     Tom  Jim, Bill
      2019-06-10    None       Bill
Job 2 2019-06-10     Tom       None

Pandas- pivoting column into (conditional) aggregated string

Tags:

python

pandas

aggregation

pivot-table

MarkD

2 Answers

user3483203

ALollz

Recent Activity

Donate For Us

Pandas- pivoting column into (conditional) aggregated string

Tags:

python

pandas

aggregation

pivot-table

MarkD

2 Answers

user3483203

ALollz

Related questions

Recent Activity

Donate For Us