How to sort pandas DataFrame with a key?

Question

I'm looking for a way to sort pandas DataFrame. pd.DataFrame.sort_values doesn't accept a key function. I can convert it to list and apply a key to sorted function, but that will be slow. The other way seems something related to categorical index. I don't have a fixed number of rows so I don't know if categorical index will be applicable.

I have given an example case of what kind of data I want to sort:

Input DataFrame:

     clouds  fluff
0    {[}      1
1    >>>      2
2     {1      3
3    123      4
4  AAsda      5
5    aad      6

Output DataFrame:

     clouds  fluff
0    >>>      2
1    {[}      1
2     {1      3
3    123      4
4    aad      6
5  AAsda      5

The rule for sorting (priority):

First special characters (sort by ascii among themselves)

Next is by numbers

next is by lower case alphabets (lexicographical)

next is Capital case alphabets (lexicographical)

In plain python I'd do it like

from functools import cmp_to_key

def ks(a, b):
    # "Not exactly this but similar"
    if a.isupper():
        return -1
    else:
        return 1

Case

sorted(['aa', 'AA', 'dd', 'DD'], key=cmp_to_key(ks))

Answer:

['DD', 'AA', 'aa', 'dd']

How would you do it with Pandas?

Vasantha Ganesh · Accepted Answer

As of pandas 1.1.0, pandas.DataFrame.sort_values accepts an argument key with type callable.

So in this case we would use:

df.sort_values(by='clouds', key=kf)

where kf is the key function that operates on type Series. Accepts and returns Series.

wuiover · Answer

As of pandas 1.2.0, I did this

import numpy as np
import pandas as pd

df = pd.DataFrame(['aa', 'dd', 'DD', 'AA'], columns=["data"])

# This is the sorting rule
rule = {
    "DD": 1,
    "AA": 10,
    "aa": 20,
    "dd": 30,
    }


def particular_sort(series):
    """
    Must return one Series
    """
    return series.apply(lambda x: rule.get(x, 1000))


new_df = df.sort_values(by=["data"], key=particular_sort)
print(new_df)  # DD, AA, aa, dd

Of course, you can do this too, but it may be difficult to understand,smile

new_df = df.sort_values(by=["data"], key=lambda x: x.apply(lambda y: rule.get(y, 1000)))
print(new_df)  # DD, AA, aa, dd

How to sort pandas DataFrame with a key?

Tags:

python

pandas

Vasantha Ganesh

2 Answers

Vasantha Ganesh

wuiover

Recent Activity

Donate For Us

How to sort pandas DataFrame with a key?

Tags:

python

pandas

Vasantha Ganesh

2 Answers

Vasantha Ganesh

wuiover

Related questions

Recent Activity

Donate For Us