Aggregating using arbitrary precedence in pandas

Question

Given the dataframe

  Column1 Column2  Column3
0       a     foo        1
1       a     bar        2
2       b     baz       12
3       b     foo        4
4       c     bar        6
5       c     foo        3
6       c     baz        7
7       d     foo        9

I'd like to groupby Column1, using an arbitrary order of precedence for which values to keep from column3.

For example, if the order of precedence is:

baz
bar
foo

then I would expect the output to show as

         Column2
Column1         
a              2
b             12
c              7
d              9

with the "a" group keeping the "bar" value because there is no "baz" for the "a" group, "b" group keeping the "baz" value, and so on.

What's the most elegent way to do this? Right now I'm applying a series of apply lambda's to work through each item, but it feels sloppy.

EDIT: What if the precendence goes across multiple columns?

Ex.

  Column1 Column2 Column3  Column4
0       a     foo    john        1
1       a     bar     jim        2
2       b     baz    jack       12
3       b     foo     jim        4
4       c     bar    john        6
5       c     foo    john        3
6       c     baz    jack        7
7       d     foo    jack        9

If the order of precedence across both Column2 and Column3 is:

jim
baz
foo

then I would expect the output to show as

        Column2  Column3
Column1                 
a           jim        2
b           jim        4
c           baz        7
d           foo        9

anky · Accepted Answer

You can try with the below logic with map then groupby+transform

order = ['baz','bar','foo']
d = {v:k for k,v in dict(enumerate(order)).items()}
out = df.assign(k=df['Column2'].map(d))

print(df[out['k'].eq(out.groupby("Column1")['k'].transform("min"))])

  Column1 Column2  Column3
1       a     bar        2
2       b     baz       12
6       c     baz        7
7       d     foo        9

EDIT , for multiple columns, using the same logic as above, here is a way:

order = ['jim','baz','foo']
d = {i:e for e,i in enumerate(order)}

s = df[['Column2','Column3']].replace(d).apply(pd.to_numeric,errors='coerce').min(1)

out = (s[s.eq(s.groupby(df['Column1']).transform("min"))]
       .replace(dict(enumerate(order))).rename("Col"))

df.loc[out.index,["Column1","Column4"]].join(out)

  Column1  Column4  Col
1       a        2  jim
3       b        4  jim
6       c        7  baz
7       d        9  foo

Aggregating using arbitrary precedence in pandas

Tags:

python

pandas

pandas-groupby

bcalc

1 Answers

anky

Recent Activity

Donate For Us

Aggregating using arbitrary precedence in pandas

Tags:

python

pandas

pandas-groupby

bcalc

1 Answers

anky

Related questions

Recent Activity

Donate For Us