Hi everyone I have tried to look everywhere for this issue but I cannot figure a solution out. I'd be glad if you'd help me.
So, basically I have this dataset:
df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
"col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})
df.dtypes
col1 object
col2 int64
dtype: object
and I want to order the first column with a costum order and the second column with ascending order. The final result should be the following:
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
0 xxx 1994
1 xxx 2013
2 xxx 2020
So, in order to accomplish that I do this:
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order
df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))
but I end up with this:
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
0 xxx 2020
1 xxx 1994
2 xxx 2013
If I only do:
df.sort_values(by = ['col1' , 'col2'])
col1 col2
3 kkk 1000
4 www 1996
5 www 2021
1 xxx 1994
2 xxx 2013
0 xxx 2020
The col2 is ordered fine. I really don't know why I am having this issue. Has anyone experienced something similar? Thanks
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.
Pandas Series: sort_index() function The sort_index() function is used to sort Series by index labels. Returns a new Series sorted by label if inplace argument is False, otherwise updates the original series and returns None.
To check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.
Suppose we want to change the order of the index of series, then we have to use the Series. reindex() Method of pandas module for performing this task.
Let’s dive into how to sort our Pandas DataFrame using the .sort_values () method. The key parameter in the .sort_values () function is the by= parameter, as it tells Pandas which column (s) to sort by. The parameter takes either a single column as a string or a list of columns as a list of strings.
By default, Pandas will sort any missing values to the last position. In order to change this behavior, you can use the na_position='first' argument. Let’s try this out by sorting the Name column and placing missing values first: Finally, let’s see how to apply the change in sort order in place.
Sort a Series in ascending or descending order by some criterion. Axis to direct sorting. The value ‘index’ is accepted for compatibility with DataFrame.sort_values. If True, sort values in ascending order, otherwise descending. If True, perform operation in-place.
It’s different than the sorted Python function since it cannot sort a data frame and particular column cannot be selected. DataFrame.sort_values (by, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’) Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
Possible trick is expand dictionary by values from col2
:
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order
d = {**d, **dict(zip(df.col2, df.col2))}
df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))
print (df)
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
Or solution with get
- if no match is returned same value instead NaN
:
df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(lambda y: d.get(y, y)))
print (df)
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
Solution with helper column:
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0}
df = df.assign(new=df['col1'].map(d)).sort_values(by=['new','col2']).drop('new', axis=1)
print (df)
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With