Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sort_value() issue. Wrong sorting integer when applied key parameter

Tags:

python

pandas

Hi everyone I have tried to look everywhere for this issue but I cannot figure a solution out. I'd be glad if you'd help me.

So, basically I have this dataset:

df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
                 "col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})

df.dtypes
col1    object
col2     int64
dtype: object

and I want to order the first column with a costum order and the second column with ascending order. The final result should be the following:

    col1    col2
4    www    1996
5    www    2021
3    kkk    1000
0    xxx    1994
1    xxx    2013
2    xxx    2020

So, in order to accomplish that I do this:

d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order 

df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))

but I end up with this:

    col1    col2
4    www    1996
5    www    2021
3    kkk    1000
0    xxx    2020
1    xxx    1994
2    xxx    2013

If I only do:

df.sort_values(by = ['col1' , 'col2'])

    col1    col2
3    kkk    1000
4    www    1996
5    www    2021
1    xxx    1994
2    xxx    2013
0    xxx    2020

The col2 is ordered fine. I really don't know why I am having this issue. Has anyone experienced something similar? Thanks

like image 787
fecke9296 Avatar asked Apr 23 '21 10:04

fecke9296


People also ask

How do I sort values in a DataFrame pandas?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.

What does Sort_index do in pandas?

Pandas Series: sort_index() function The sort_index() function is used to sort Series by index labels. Returns a new Series sorted by label if inplace argument is False, otherwise updates the original series and returns None.

How do you check if a column is sorted in pandas?

To check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.

How do I change the order of indexes in pandas?

Suppose we want to change the order of the index of series, then we have to use the Series. reindex() Method of pandas module for performing this task.

How to sort a pandas Dataframe by value?

Let’s dive into how to sort our Pandas DataFrame using the .sort_values () method. The key parameter in the .sort_values () function is the by= parameter, as it tells Pandas which column (s) to sort by. The parameter takes either a single column as a string or a list of columns as a list of strings.

How do I change the Order of missing values in pandas?

By default, Pandas will sort any missing values to the last position. In order to change this behavior, you can use the na_position='first' argument. Let’s try this out by sorting the Name column and placing missing values first: Finally, let’s see how to apply the change in sort order in place.

How do you sort a Dataframe in ascending order?

Sort a Series in ascending or descending order by some criterion. Axis to direct sorting. The value ‘index’ is accepted for compatibility with DataFrame.sort_values. If True, sort values in ascending order, otherwise descending. If True, perform operation in-place.

What is the difference between Dataframe sort and quicksort in Python?

It’s different than the sorted Python function since it cannot sort a data frame and particular column cannot be selected. DataFrame.sort_values (by, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’) Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.


1 Answers

Possible trick is expand dictionary by values from col2:

d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order 
d = {**d, **dict(zip(df.col2, df.col2))}

df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))
print (df)
  col1  col2
4  www  1996
5  www  2021
3  kkk  1000
1  xxx  1994
2  xxx  2013
0  xxx  2020

Or solution with get - if no match is returned same value instead NaN:

df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(lambda y: d.get(y, y)))
print (df)
  col1  col2
4  www  1996
5  www  2021
3  kkk  1000
1  xxx  1994
2  xxx  2013
0  xxx  2020

Solution with helper column:

d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} 

df = df.assign(new=df['col1'].map(d)).sort_values(by=['new','col2']).drop('new', axis=1)
print (df)
  col1  col2
4  www  1996
5  www  2021
3  kkk  1000
1  xxx  1994
2  xxx  2013
0  xxx  2020
like image 79
jezrael Avatar answered Sep 27 '22 22:09

jezrael