Rollup of multiple string columns by id - python

Question

I got following dataframe in Python:

d = pd.DataFrame({'id': [1, 1, 1, 2, 2, 3],
              'col1': ['normal', 'well', 'normal', 'normal', 'well', 'normal'],
              'col2': ['bad', 'normal','normal', 'normal', 'normal', 'bad']})

I would like to rollup by id but keep in columns strings that are other than 'normal' or 'normal' if there is nothing else ('well' or 'bad'). Something like following:

result = pd.DataFrame({'id': [1, 2, 3],
                'col1': ['well', 'well', 'normal'],
                'col2': ['bad', 'normal', 'bad']})

I was thinking about sorting and then using groupby and .first but not sure how to get desired levels on the top in each column.

piRSquared · Accepted Answer

Use Categorical to define order

cats = ['well', 'bad', 'normal']
d = d.assign(
    col1=pd.Categorical(d.col1, cats, ordered=True),
    col2=pd.Categorical(d.col2, cats, ordered=True)
)

d.groupby('id', as_index=False).min()

   id    col1    col2
0   1    well     bad
1   2    well  normal
2   3  normal     bad

jezrael · Answer

Use replace by NaNs first if no NaNs values before GroupBy.first:

d = d.replace('normal', np.nan).groupby('id').first().fillna('normal')
#alternative solution
d = d.mask(d == 'normal').groupby('id').first().fillna('normal')

print (d)
      col1    col2
id                
1     well     bad
2     well  normal
3   normal     bad

Rollup of multiple string columns by id - python

Tags:

python

pandas

John Snow

2 Answers

piRSquared

jezrael

Recent Activity

Donate For Us

Rollup of multiple string columns by id - python

Tags:

python

pandas

John Snow

2 Answers

piRSquared

jezrael

Related questions

Recent Activity

Donate For Us