Running this code:
df = pd.DataFrame(['ADc','Abc','AEc'],columns = ['Test'],index=[0,1,2])
df.sort(columns=['Test'],axis=0, ascending=False,inplace=True)
Returns a dataframe column ordered as: [Abc, AEc, ADc]
.
ADc should be before AEc, what's going on?
By default, the sort() method sorts the list in ASCIIbetical order rather than actual alphabetical order. This means uppercase letters come before lowercase letters. This causes the sort() function to treat all the list items as if they were lowercase without actually changing the values in the list.
Applying capitalize() function We apply the str. capitalize() function to the above dataframe for the column named Day. As you can notice, the name of all the days are capitalized at the first letter.
You can sort a DataFrame by row or column value as well as by row or column index. Both rows and columns have indices, which are numerical representations of where the data is in your DataFrame. You can retrieve data from specific rows or columns using the DataFrame's index locations.
I don't think that's a pandas bug. It seems to be just the way python sorting algorithm works with mixed cased letters (being case sensitive) - look here
Because when you do:
In [1]: l1 = ['ADc','Abc','AEc']
In [2]: l1.sort(reverse=True)
In [3]: l1
Out[3]: ['Abc', 'AEc', 'ADc']
So, since apparently one cannot control the sorting algorithm using the pandas sort method, just use a lower cased version of that column for the sorting and drop it later on:
In [4]: df = pd.DataFrame(['ADc','Abc','AEc'], columns=['Test'], index=[0,1,2])
In [5]: df['test'] = df['Test'].str.lower()
In [6]: df.sort(columns=['test'], axis=0, ascending=True, inplace=True)
In [7]: df.drop('test', axis=1, inplace=True)
In [8]: df
Out[8]:
Test
1 Abc
0 ADc
2 AEc
Note: If you want the column sorted alphabetically, the ascending
argument must be set to True
EDIT:
As DSM suggested, to avoid creating a new helper column, you can do:
df = df.loc[df["Test"].str.lower().order().index]
UPDATE:
As pointed out by weatherfrog, for newer versions of pandas the correct method is .sort_values()
. So the above one-liner becomes:
df = df.loc[df["Test"].str.lower().sort_values().index]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With