Sort a Pandas Dataframe by Multiple Columns Using Key Argument

Question

I have a dataframe a pandas dataframe with the following columns:

df = pd.DataFrame([
    ['A2', 2],
    ['B1', 1],
    ['A1', 2],
    ['A2', 1],
    ['B1', 2],
    ['A1', 1]], 
  columns=['one','two'])

Which I am hoping to sort primarily by column 'two', then by column 'one'. For the secondary sort, I would like to use a custom sorting rule that will sort column 'one' by the alphabetic character [A-Z] and then the trailing numeric number [0-100]. So, the outcome of the sort would be:

one two
 A1   1
 B1   1
 A2   1
 A1   2
 B1   2
 A2   2

I have sorted a list of strings similar to column 'one' before using a sorting rule like so:

def custom_sort(value):
    return (value[0], int(value[1:]))

my_list.sort(key=custom_sort)

If I try to apply this rule via a pandas sort, I run into a number of issues including:

The pandas DataFrame.sort_values() function accepts a key for sorting like the sort() function, but the key function should be vectorized (per the pandas documentation). If I try to apply the sorting key to only column 'one', I get the error "TypeError: cannot convert the series to <class 'int'>"
When you use the pandas DataFrame.sort_values() method, it applies the sort key to all columns you pass in. This will not work since I want to sort first by the column 'two' using a native numerical sort.

How would I go about sorting the DataFrame as mentioned above?

Alexander · Accepted Answer

You can split column one into its constituent parts, add them as columns to the dataframe and then sort on them with column two. Finally, remove the temporary columns.

>>> (df.assign(lhs=df['one'].str[0], rhs=df['one'].str[1:].astype(int))
       .sort_values(['two', 'rhs', 'lhs'])
       .drop(columns=['lhs', 'rhs']))
  one  two
5  A1    1
1  B1    1
3  A2    1
2  A1    2
4  B1    2
0  A2    2

Sort a Pandas Dataframe by Multiple Columns Using Key Argument

Tags:

python

sorting

pandas

dataframe

user11058068

1 Answers

Alexander

Recent Activity

Donate For Us

Sort a Pandas Dataframe by Multiple Columns Using Key Argument

Tags:

python

sorting

pandas

dataframe

user11058068

1 Answers

Alexander

Related questions

Recent Activity

Donate For Us