Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a pandas dataframe by a column that has both numbers and strings?

I have a dataframe that looks like this

         col0         col1  col2   col4
         1    '1ZE7999'  865545   20    20
         2    'R022428'  865584  297     0
         3    34         865665  296     0 
         4    56         865700  297     0
         5    100        865628  292     5

I want to sort it by 'col0', first the numerical values, then the strings, the way that Excel sorts

       col0         col1  col2   col4
  3    34         865665  296     0 
  4    56         865700  297     0
  5    100        865628  292     5
  1    '1ZE7999'  865545   20    20
  2    'R022428'  865584  297     0

I used

df.sort_values(by='col1', ascending=True)

But that does not sort it that way, it sorts it from 0-9 then a-z

      col0         col1  col2   col4
 1    '1ZE7999'  865545   20    20
 5    100        865628  292     5
 3    34         865665  296     0 
 4    56         865700  297     0
 2    'R022428'  865584  297     0
like image 656
Miguel Manjarrez Avatar asked Oct 17 '25 12:10

Miguel Manjarrez


2 Answers

pd.to_numeric + sort_values + loc -

df.loc[pd.to_numeric(df.col0, errors='coerce').sort_values().index]

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

Details

pd.to_numeric coerces non-integral values to NaN -

i = pd.to_numeric(df.col0, errors='coerce')
i

1      NaN
2      NaN
3     34.0
4     56.0
5    100.0
Name: col0, dtype: float64

sort_values sorts the column, ignoring NaNs.

j = i.sort_values()
j

3     34.0
4     56.0
5    100.0
1      NaN
2      NaN
Name: col0, dtype: float64

Observe the index. All you need to do is use the index to reindex the dataframe. Either loc or reindex will do it.

df.loc[j.index]

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0
df.reindex(index=j.index)

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

If you need to reset the index, that's easily done.

df.loc[j.index].reset_index(drop=True)

        col0    col1  col2  col4
0         34  865665   296     0
1         56  865700   297     0
2        100  865628   292     5
3  '1ZE7999'  865545    20    20
4  'R022428'  865584   297     0
like image 57
cs95 Avatar answered Oct 20 '25 02:10

cs95


By using natsort

from natsort import natsorted

df.set_index('col0').reindex(natsorted(df.col0.tolist(), key=lambda y: y.lower())).reset_index()
Out[736]: 
        col0    col1  col2  col4
0         34  865665   296     0
1         56  865700   297     0
2        100  865628   292     5
3  '1ZE7999'  865545    20    20
4  'R022428'  865584   297     0
like image 42
BENY Avatar answered Oct 20 '25 02:10

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!