I have a dataframe a pandas dataframe with the following columns:
df = pd.DataFrame([
['A2', 2],
['B1', 1],
['A1', 2],
['A2', 1],
['B1', 2],
['A1', 1]],
columns=['one','two'])
Which I am hoping to sort primarily by column 'two', then by column 'one'. For the secondary sort, I would like to use a custom sorting rule that will sort column 'one' by the alphabetic character [A-Z] and then the trailing numeric number [0-100]. So, the outcome of the sort would be:
one two
A1 1
B1 1
A2 1
A1 2
B1 2
A2 2
I have sorted a list of strings similar to column 'one' before using a sorting rule like so:
def custom_sort(value):
return (value[0], int(value[1:]))
my_list.sort(key=custom_sort)
If I try to apply this rule via a pandas sort, I run into a number of issues including:
DataFrame.sort_values() function accepts a key for sorting like the sort() function, but the key function should be vectorized (per the pandas documentation). If I try to apply the sorting key to only column 'one', I get the error "TypeError: cannot convert the series to <class 'int'>"DataFrame.sort_values() method, it applies the sort key to all columns you pass in. This will not work since I want to sort first by the column 'two' using a native numerical sort.How would I go about sorting the DataFrame as mentioned above?
You can split column one into its constituent parts, add them as columns to the dataframe and then sort on them with column two. Finally, remove the temporary columns.
>>> (df.assign(lhs=df['one'].str[0], rhs=df['one'].str[1:].astype(int))
.sort_values(['two', 'rhs', 'lhs'])
.drop(columns=['lhs', 'rhs']))
one two
5 A1 1
1 B1 1
3 A2 1
2 A1 2
4 B1 2
0 A2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With