Pandas: most efficient way to apply complex function over entire data frame

Question

I have a df which need to be grouped, filtered, modified and a custom function applied. My 'normal' approach is to slow and not the most elegant one!

[name]  [cnt]   [num]    [place]  [y]

AAAA     12    20182.0     5.0   1.75
BBBB     12    20182.0     7.0   2.00
AAAA     10    20381.0    10.0   9.25
BBBB     10    20381.0    12.0  18.75
EEEE     12    21335.0     1.0   0.00
RRRR     12    21335.0     8.0   3.00
CCCC     12    21335.0     9.0   3.50

I need to group the df on [num] i.e.:

[name]  [cnt]   [num]    [place]  [y]

AAAA     12    20182.0     5.0   1.75
BBBB     12    20182.0     7.0   2.00

For each of those groups I need to do three tasks:

I. Filter out all rows inside one group with same [y] value. Groups can consist of up to 6 values.

II. Create all possible subsets, with length two, for the [place]: (5,7) and (7,5)

III. Apply custom function to every subset:

def func(p1, p2):

    diff_p = p2-p1
    if diff_p > 0:
        return 2 / (diff_p * p2)
    else:
        return p1 / (diff_p * 12)

Where p1 = first place of tuple; p2 = second place of tuple; 12 is the value from [cnt] column. Which gives for the example group:

[name]  [cnt]   [num]    [place]  [y]  [desired]

AAAA     12    20182.0     5.0   1.75   0.1428571429
BBBB     12    20182.0     7.0   2.00  -0.2916666667

AAAA's [desired] column holds the mean 'custom function result' of all subsets where AAAA's place value is the first part of the tuple. Which is only one tuple in this example.

(But like mentioned the groups can consist of up to 6 values, which will create multiple tuples where AAAA's place is the first value)

My current approach is to do a

df.groupby('num').apply(...)

apply will do:

.drop_duplicates('y',keep=False)

list(itertools.permutations(df_grp.place.values, 2))

apply the custom function

.mean()

It becomes really really slow after a while since the first df is the output from another .groupby().apply() call

stovfl · Accepted Answer

Try GroupBy.aggregate(func, *args, **kwargs)[source] to aggregate your three tasks.

Pandas: most efficient way to apply complex function over entire data frame

Tags:

performance

python

algorithm

pandas

dataframe

jumboRumbo

1 Answers

stovfl

Recent Activity

Donate For Us

Pandas: most efficient way to apply complex function over entire data frame

Tags:

performance

python

algorithm

pandas

dataframe

jumboRumbo

1 Answers

stovfl

Related questions

Recent Activity

Donate For Us