I have the following dataframe which has the data of different jobs start and end time at different intervals. A small part of dataframe is shown below.
Dataframe(df):
result | job | time
START | JOB0 | 1357
START | JOB2 | 2405
END | JOB2 | 2379
START | JOB3 | 4010
END | JOB0 | 5209
END | JOB3 | 6578
START | JOB0 | 6000
END | JOB0 | 6100
(Note - Original Dataframe has 5 Jobs (JOB0 to JOB4)
I want to convert the values (START and END) of column result as individual columns in the dataframe.
Required Dataframe(df2)
job | START | END
JOB0 | 1357 | 5209
JOB2 | 2405 | 2379
JOB3 | 4010 | 6578
JOB0 | 6000 | 6100
Code
I tried implementing this using a pivot_table but it is giving aggregated values which is not required.
df2 = df.pivot_table('time', 'job','result')
Code Output
result | END | START
job
JOB0 | 5.000589e+08 5.000636e+08
JOB1 | 4.999141e+08 4.999188e+08
JOB2 | 5.001668e+08 5.001715e+08
JOB3 | 4.995190e+08 4.995187e+08
JOB4 | 5.003238e+08 5.003236e+08
How can I attain the required dataframe?
You have duplicate job (JOB0 has 2 different start and end times), so you need to pivot also by the cumulative count (groupby.cumcount) so indices are unique. You can tidy up the pivot afterwards by dropping the cumulative count level.
df['idx'] = df.groupby(['job', 'result']).cumcount()
(df.pivot(index=['job', 'idx'], columns='result', values='time')
.sort_index(level=1)
.droplevel(1)
.reset_index()
.rename_axis(None, axis=1)[['job', 'START', 'END']])
[out]
job START END
0 JOB0 1357 5209
1 JOB2 2405 2379
2 JOB3 4010 6578
3 JOB0 6000 6100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With