Given the dataframe
  Column1 Column2  Column3
0       a     foo        1
1       a     bar        2
2       b     baz       12
3       b     foo        4
4       c     bar        6
5       c     foo        3
6       c     baz        7
7       d     foo        9
I'd like to groupby Column1, using an arbitrary order of precedence for which values to keep from column3.
For example, if the order of precedence is:
then I would expect the output to show as
         Column2
Column1         
a              2
b             12
c              7
d              9
with the "a" group keeping the "bar" value because there is no "baz" for the "a" group, "b" group keeping the "baz" value, and so on.
What's the most elegent way to do this? Right now I'm applying a series of apply lambda's to work through each item, but it feels sloppy.
EDIT: What if the precendence goes across multiple columns?
Ex.
  Column1 Column2 Column3  Column4
0       a     foo    john        1
1       a     bar     jim        2
2       b     baz    jack       12
3       b     foo     jim        4
4       c     bar    john        6
5       c     foo    john        3
6       c     baz    jack        7
7       d     foo    jack        9
If the order of precedence across both Column2 and Column3 is:
then I would expect the output to show as
        Column2  Column3
Column1                 
a           jim        2
b           jim        4
c           baz        7
d           foo        9
                You can try with the below logic with map then groupby+transform
order = ['baz','bar','foo']
d = {v:k for k,v in dict(enumerate(order)).items()}
out = df.assign(k=df['Column2'].map(d))
print(df[out['k'].eq(out.groupby("Column1")['k'].transform("min"))])
  Column1 Column2  Column3
1       a     bar        2
2       b     baz       12
6       c     baz        7
7       d     foo        9
EDIT , for multiple columns, using the same logic as above, here is a way:
order = ['jim','baz','foo']
d = {i:e for e,i in enumerate(order)}
s = df[['Column2','Column3']].replace(d).apply(pd.to_numeric,errors='coerce').min(1)
out = (s[s.eq(s.groupby(df['Column1']).transform("min"))]
       .replace(dict(enumerate(order))).rename("Col"))
df.loc[out.index,["Column1","Column4"]].join(out)
  Column1  Column4  Col
1       a        2  jim
3       b        4  jim
6       c        7  baz
7       d        9  foo
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With