Hi everyone I have tried to look everywhere for this issue but I cannot figure a solution out. I'd be glad if you'd help me. So, basically I have this dataset: <pre class="prettyprint lang-py prettyprint-override"><code>df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'], "col2": [ 2020, 1994, 2013, 1000, 1996, 2021]}) df.dtypes col1 object col2 int64 dtype: object </code></pre> and I want to order the first column with a costum order and the second column with ascending order. The final result should be the following: <pre class="prettyprint lang-py prettyprint-override"><code> col1 col2 4 www 1996 5 www 2021 3 kkk 1000 0 xxx 1994 1 xxx 2013 2 xxx 2020 </code></pre> So, in order to accomplish that I do this: <pre class="prettyprint lang-py prettyprint-override"><code>d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d)) </code></pre> but I end up with this: <pre class="prettyprint lang-py prettyprint-override"><code> col1 col2 4 www 1996 5 www 2021 3 kkk 1000 0 xxx 2020 1 xxx 1994 2 xxx 2013 </code></pre> If I only do: <pre class="prettyprint lang-py prettyprint-override"><code>df.sort_values(by = ['col1' , 'col2']) col1 col2 3 kkk 1000 4 www 1996 5 www 2021 1 xxx 1994 2 xxx 2013 0 xxx 2020 </code></pre> The col2 is ordered fine. I really don't know why I am having this issue. Has anyone experienced something similar? Thanks

Possible trick is expand dictionary by values from <code>col2</code>: <pre class="prettyprint"><code>d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order d = {**d, **dict(zip(df.col2, df.col2))} df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d)) print (df) col1 col2 4 www 1996 5 www 2021 3 kkk 1000 1 xxx 1994 2 xxx 2013 0 xxx 2020 </code></pre> Or solution with <code>get</code> - if no match is returned same value instead <code>NaN</code>: <pre class="prettyprint"><code>df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(lambda y: d.get(y, y))) print (df) col1 col2 4 www 1996 5 www 2021 3 kkk 1000 1 xxx 1994 2 xxx 2013 0 xxx 2020 </code></pre> Solution with helper column: <pre class="prettyprint"><code>d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} df = df.assign(new=df['col1'].map(d)).sort_values(by=['new','col2']).drop('new', axis=1) print (df) col1 col2 4 www 1996 5 www 2021 3 kkk 1000 1 xxx 1994 2 xxx 2013 0 xxx 2020 </code></pre>

Pandas sort_value() issue. Wrong sorting integer when applied key parameter

Tags:

python

pandas

Hi everyone I have tried to look everywhere for this issue but I cannot figure a solution out. I'd be glad if you'd help me.

So, basically I have this dataset:

df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
                 "col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})

df.dtypes
col1    object
col2     int64
dtype: object

and I want to order the first column with a costum order and the second column with ascending order. The final result should be the following:

    col1    col2
4    www    1996
5    www    2021
3    kkk    1000
0    xxx    1994
1    xxx    2013
2    xxx    2020

So, in order to accomplish that I do this:

d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order 

df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))

but I end up with this:

    col1    col2
4    www    1996
5    www    2021
3    kkk    1000
0    xxx    2020
1    xxx    1994
2    xxx    2013

If I only do:

df.sort_values(by = ['col1' , 'col2'])

    col1    col2
3    kkk    1000
4    www    1996
5    www    2021
1    xxx    1994
2    xxx    2013
0    xxx    2020

The col2 is ordered fine. I really don't know why I am having this issue. Has anyone experienced something similar? Thanks

787

asked Apr 23 '21 10:04

fecke9296

1 Answers

Possible trick is expand dictionary by values from col2:

d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order 
d = {**d, **dict(zip(df.col2, df.col2))}

df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))
print (df)
  col1  col2
4  www  1996
5  www  2021
3  kkk  1000
1  xxx  1994
2  xxx  2013
0  xxx  2020

Or solution with get - if no match is returned same value instead NaN:

df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(lambda y: d.get(y, y)))
print (df)
  col1  col2
4  www  1996
5  www  2021
3  kkk  1000
1  xxx  1994
2  xxx  2013
0  xxx  2020

Solution with helper column:

d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} 

df = df.assign(new=df['col1'].map(d)).sort_values(by=['new','col2']).drop('new', axis=1)
print (df)
  col1  col2
4  www  1996
5  www  2021
3  kkk  1000
1  xxx  1994
2  xxx  2013
0  xxx  2020

answered Sep 27 '22 22:09

jezrael

Related questions
                            
                                Groupby and aggregate using lambda functions
                            
                                Can't get rid of unwanted stuff while scraping email addresses
                            
                                Comparison of np.random.choice vs np.random.shuffle for samples without replacement
                            
                                How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased') work??
                            
                                How can I check if a Python collection is ordered?
                            
                                How to config 'Completer.use_jedi' to 'False' in Juypter Notebook permanently
                            
                                How to Deal with Lat/Lon Arrays with Multiple Dimensions?
                            
                                Preform aggregation(s) on multiindex columns
                            
                                Cannot call Python function from Javascript in Notebook
                            
                                Same random numbers in C++ as computed by Python3 numpy.random.rand
                            
                                Writing data from a Python List and a Dictionary to CSV
                            
                                How to implement Grad-CAM on a trained network
                            
                                Poetry could not find a pyproject.toml file in C:\
                            
                                How to serialise and deserialise complex POCO data structures in Python to/from JSON
                            
                                The wikipedia api seems to almost always get the word in question wrong
                            
                                Automatically simplify redundant arithmetic relations
                            
                                lask.cli.NoAppException: While importing "app", an ImportError was raised:
                            
                                Color percentage in image for Python using OpenCV
                            
                                Getting 403 when using Selenium to automate checkout process
                            
                                ImportError: Spatial indexes require either `rtree` or `pygeos` in geopanda but rtree is installed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With