Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort dataframe based on minimum value of two columns

Tags:

python

pandas

Let's assume I have the following dataframe:

import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)

I want this dataframe sorted by col1 and col2 on the minimum value. The order of the indexes should be 2, 0, 1, 3.

I tried this with df.sort_values(by=['col2', 'col1']), but than it takes the minimum of col1 first and then of col2. Is there anyway to order by taking the minimum of two columns?

like image 434
Tox Avatar asked Jan 24 '26 14:01

Tox


2 Answers

Using numpy.lexsort:

order = np.lexsort(np.sort(df[['col1', 'col2']])[:, ::-1].T)

out = df.iloc[order]

Output:

   col1  col2  col3  outcome
2     3     1     1        1
0     1     4     1        1
1     2     2     0        0
3     4     3     1        0

Note that you can easily handle any number of columns:

df.iloc[np.lexsort(np.sort(df[['col1', 'col2', 'col3']])[:, ::-1].T)]

   col1  col2  col3  outcome
1     2     2     0        0
2     3     1     1        1
0     1     4     1        1
3     4     3     1        0
like image 138
mozway Avatar answered Jan 27 '26 04:01

mozway


One way (not the most efficient):

idx = df[['col2', 'col1']].apply(lambda x: tuple(sorted(x)), axis=1).sort_values().index

Output:

>>> df.loc[idx]
   col1  col2  col3  outcome
2     3     1     1        1
0     1     4     1        1
1     2     2     0        0
3     4     3     1        0

>>> idx
Int64Index([2, 0, 1, 3], dtype='int64')
like image 31
Corralien Avatar answered Jan 27 '26 04:01

Corralien



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!