Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Groupby with Lambda and Algorithm

Given this data frame:

import pandas as pd
import jenkspy
f = pd.DataFrame({'BreakGroup':['A','A','A','A','A','A','B','B','B','B','B'],
                 'Final':[1,2,3,4,5,6,10,20,30,40,50]})
    BreakGroup  Final
0         A     1
1         A     2
2         A     3
3         A     4
4         A     5
5         A     6
6         B     10
7         B     20
8         B     30
9         B     40
10        B     50

I'd like to use jenkspy to identify the group, based on natural breaks for 4 groups (classes), to which each value in "Final" within the group "BreakGroup" belongs.

I started out by doing this:

jenks=lambda x: jenkspy.jenks_breaks(f['Final'].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)

...which results in:

BreakGroup
A    [1.0, 10.0, 20.0, 30.0, 50.0]
B    [1.0, 10.0, 20.0, 30.0, 50.0]
Name: BreakGroup, dtype: object

The first problem here, as you may well have surmised, is that it applies the lambda function to the whole column of "Final" scores instead of just those belonging to each group in the Groupby. The second problem is that I need a column designating the correct group (class) membership, presumably by using transform instead of apply.

I then tried this:

jenks=lambda x: jenkspy.jenks_breaks(f['Final'].loc[f['BreakGroup']==x].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)

...but was promptly beaten back into submission:

ValueError: Can only compare identically-labeled Series objects

Update:

Here is the desired result. The "Result" column contains the upper limit of the group for the respective value from "Final" per group "BreakGroup":

    BreakGroup  Final   Result
0             A     1   2
1             A     2   3
2             A     3   4
3             A     4   4
4             A     5   6
5             A     6   6
6             B     10  20
7             B     20  30
8             B     30  40
9             B     40  50
10            B     50  50

Thanks in advance!

My slightly modified application based on accepted solution:

f.sort_values('BreakGroup',inplace=True)
f.reset_index(drop=True,inplace=True)
jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
groups= lambda x: [gp for gp in x['Groups']]
#'final' value should be > lower and <= upper
upper = lambda x: [gp for gp in x['Groups'] if gp >= x['Final']][0] # or gp == max(x['Groups'])
lower= lambda x: [gp for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
GroupIndex= lambda x: [x['Groups'].index(gp) for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
f['Groups']=g.apply(groups, axis=1)
f['Upper'] = g.apply(upper, axis=1)
f['Lower'] = g.apply(lower, axis=1)
f['Group'] = g.apply(GroupIndex, axis=1)
f['Group']=f['Group']+1

This returns:

  1. The list of group boundaries

  2. The upper boundary pertinent to the value for "Final"

  3. The lower boundary pertinent to the value for "Final"

  4. The group to which the value for "Final" will belong based on logic noted in comments.

like image 485
Dance Party2 Avatar asked Dec 06 '25 10:12

Dance Party2


1 Answers

You have jenks defined as a constant in terms of x, your lambda variable, so it doesn't depend on what you feed it with apply or transform. Changing the definition of jenks to

jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)

gives

In [315]: f.groupby(['BreakGroup']).apply(jenks)
Out[315]: 
BreakGroup
A         [1.0, 2.0, 3.0, 4.0, 6.0]
B    [10.0, 20.0, 30.0, 40.0, 50.0]
dtype: object

Continuing from this redefinition,

g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
group = lambda x: [gp for gp in x['Groups'] if gp > x['Final'] or gp == max(x['Groups'])][0]
f['Result'] = g.apply(group, axis=1)

gives

In [323]: f
Out[323]: 
   BreakGroup  Final  Result
0           A      1     2.0
1           A      2     3.0
2           A      3     4.0
3           A      4     6.0
4           A      5     6.0
5           A      6     6.0
6           B     10    20.0
7           B     20    30.0
8           B     30    40.0
9           B     40    50.0
10          B     50    50.0
like image 81
EFT Avatar answered Dec 07 '25 23:12

EFT



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!