Given the dataframe
Column1 Column2 Column3
0 a foo 1
1 a bar 2
2 b baz 12
3 b foo 4
4 c bar 6
5 c foo 3
6 c baz 7
7 d foo 9
I'd like to groupby Column1, using an arbitrary order of precedence for which values to keep from column3.
For example, if the order of precedence is:
then I would expect the output to show as
Column2
Column1
a 2
b 12
c 7
d 9
with the "a" group keeping the "bar" value because there is no "baz" for the "a" group, "b" group keeping the "baz" value, and so on.
What's the most elegent way to do this? Right now I'm applying a series of apply lambda's to work through each item, but it feels sloppy.
EDIT: What if the precendence goes across multiple columns?
Ex.
Column1 Column2 Column3 Column4
0 a foo john 1
1 a bar jim 2
2 b baz jack 12
3 b foo jim 4
4 c bar john 6
5 c foo john 3
6 c baz jack 7
7 d foo jack 9
If the order of precedence across both Column2 and Column3 is:
then I would expect the output to show as
Column2 Column3
Column1
a jim 2
b jim 4
c baz 7
d foo 9
You can try with the below logic with map
then groupby+transform
order = ['baz','bar','foo']
d = {v:k for k,v in dict(enumerate(order)).items()}
out = df.assign(k=df['Column2'].map(d))
print(df[out['k'].eq(out.groupby("Column1")['k'].transform("min"))])
Column1 Column2 Column3
1 a bar 2
2 b baz 12
6 c baz 7
7 d foo 9
EDIT , for multiple columns, using the same logic as above, here is a way:
order = ['jim','baz','foo']
d = {i:e for e,i in enumerate(order)}
s = df[['Column2','Column3']].replace(d).apply(pd.to_numeric,errors='coerce').min(1)
out = (s[s.eq(s.groupby(df['Column1']).transform("min"))]
.replace(dict(enumerate(order))).rename("Col"))
df.loc[out.index,["Column1","Column4"]].join(out)
Column1 Column4 Col
1 a 2 jim
3 b 4 jim
6 c 7 baz
7 d 9 foo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With