Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas sort with capital letters

Tags:

sorting

pandas

Running this code:

df = pd.DataFrame(['ADc','Abc','AEc'],columns = ['Test'],index=[0,1,2])
df.sort(columns=['Test'],axis=0, ascending=False,inplace=True)

Returns a dataframe column ordered as: [Abc, AEc, ADc]. ADc should be before AEc, what's going on?

like image 456
tpoh Avatar asked Apr 27 '15 14:04

tpoh


People also ask

How do you sort capital letters in python?

By default, the sort() method sorts the list in ASCIIbetical order rather than actual alphabetical order. This means uppercase letters come before lowercase letters. This causes the sort() function to treat all the list items as if they were lowercase without actually changing the values in the list.

How do you capitalize words in pandas?

Applying capitalize() function We apply the str. capitalize() function to the above dataframe for the column named Day. As you can notice, the name of all the days are capitalized at the first letter.

Can you sort a Pandas DataFrame?

You can sort a DataFrame by row or column value as well as by row or column index. Both rows and columns have indices, which are numerical representations of where the data is in your DataFrame. You can retrieve data from specific rows or columns using the DataFrame's index locations.


1 Answers

I don't think that's a pandas bug. It seems to be just the way python sorting algorithm works with mixed cased letters (being case sensitive) - look here

Because when you do:

In [1]: l1 = ['ADc','Abc','AEc']
In [2]: l1.sort(reverse=True)
In [3]: l1
Out[3]: ['Abc', 'AEc', 'ADc']

So, since apparently one cannot control the sorting algorithm using the pandas sort method, just use a lower cased version of that column for the sorting and drop it later on:

In [4]: df = pd.DataFrame(['ADc','Abc','AEc'], columns=['Test'], index=[0,1,2])
In [5]: df['test'] = df['Test'].str.lower()
In [6]: df.sort(columns=['test'], axis=0, ascending=True, inplace=True)
In [7]: df.drop('test', axis=1, inplace=True)
In [8]: df
Out[8]:
  Test
1  Abc
0  ADc
2  AEc

Note: If you want the column sorted alphabetically, the ascending argument must be set to True

EDIT:

As DSM suggested, to avoid creating a new helper column, you can do:

df = df.loc[df["Test"].str.lower().order().index]

UPDATE:

As pointed out by weatherfrog, for newer versions of pandas the correct method is .sort_values(). So the above one-liner becomes:

df = df.loc[df["Test"].str.lower().sort_values().index]
like image 167
paulo.filip3 Avatar answered Sep 26 '22 03:09

paulo.filip3