Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Horizontal lookup with sorted in pandas dataframe

I have created this pandas dataframe:

d = {'Char1': [-3,2,0], 'Char2': [0,1,2], 'Char3': [-1,0,-1]}
df = pd.DataFrame(data=d)
print(df)

which looks like this:

enter image description here

I need to create two additional fields:

  • Factor1
  • Factor2

This is how Factor1 and Factor2 should be populated across each record:

  • Factor1 should contain the name of the column with lowest value (again, across each record);
  • Factor2 should contain the name of the column with the second lowest value (again, across each record).

So, the resulting dataset should look like this:

enter image description here

So, let's take a look at the first record:

  • what is the lowest value? It's -3
  • what is the name of the column that that -3 correspond to? It's Char1 -> "Char1" is then assigned to Factor1
  • what is the second lowest value? It's -1
  • what is the name of the column that that -1 correspond to? It's Char3 -> "Char3" is then assigned to Factor2

And so on.

How can I do this in Python/Pandas?

like image 545
Giampaolo Levorato Avatar asked Nov 24 '25 20:11

Giampaolo Levorato


2 Answers

An efficient method is to use the underlying numpy array with argsort:

import numpy as np

df[['Factor1', 'Factor2']] = df.columns.to_numpy()[np.argsort(df.to_numpy())[:, :2]]

output:

   Char1  Char2  Char3 Factor1 Factor2
0     -3      0     -1   Char1   Char3
1      2      1      0   Char3   Char2
2      0      2     -1   Char3   Char1
generalization to N columns:
import numpy as np

N = 2

order = np.argsort(df.to_numpy())[:, :N]
df[[f'Factor{i+1}' for i in range(N)]] = df.columns.to_numpy()[order]

example for N=3:

   Char1  Char2  Char3 Factor1 Factor2 Factor3
0     -3      0     -1   Char1   Char3   Char2
1      2      1      0   Char3   Char2   Char1
2      0      2     -1   Char3   Char1   Char2
like image 107
mozway Avatar answered Nov 27 '25 03:11

mozway


You can do idxmin and in order to get 2nd small we can mask the min

out = df.assign( **{'factor1' : df.idxmin(1), 
                    'factor2' : df.mask(df.eq(df.min(1),axis=0)).idxmin(1)})
Out[28]: 
   Char1  Char2  Char3 factor1 factor2
0     -3      0     -1   Char1   Char3
1      2      1      0   Char3   Char2
2      0      2     -1   Char3   Char1
like image 29
BENY Avatar answered Nov 27 '25 04:11

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!