I got following dataframe in Python:
d = pd.DataFrame({'id': [1, 1, 1, 2, 2, 3],
'col1': ['normal', 'well', 'normal', 'normal', 'well', 'normal'],
'col2': ['bad', 'normal','normal', 'normal', 'normal', 'bad']})
I would like to rollup by id but keep in columns strings that are other than 'normal' or 'normal' if there is nothing else ('well' or 'bad'). Something like following:
result = pd.DataFrame({'id': [1, 2, 3],
'col1': ['well', 'well', 'normal'],
'col2': ['bad', 'normal', 'bad']})
I was thinking about sorting and then using groupby and .first but not sure how to get desired levels on the top in each column.
Use Categorical to define order
cats = ['well', 'bad', 'normal']
d = d.assign(
col1=pd.Categorical(d.col1, cats, ordered=True),
col2=pd.Categorical(d.col2, cats, ordered=True)
)
d.groupby('id', as_index=False).min()
id col1 col2
0 1 well bad
1 2 well normal
2 3 normal bad
Use replace by NaNs first if no NaNs values before GroupBy.first:
d = d.replace('normal', np.nan).groupby('id').first().fillna('normal')
#alternative solution
d = d.mask(d == 'normal').groupby('id').first().fillna('normal')
print (d)
col1 col2
id
1 well bad
2 well normal
3 normal bad
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With