This is similar to the following, however I wanted to take it one question further: pandas groupby apply on multiple columns to generate a new column
I have this dataframe:
Group Value Part Ratio
0 A 6373 10 0.637300
1 A 2512 10 0.251200
2 A 603 10 0.060300
3 A 512 10 0.051200
4 B 5200 20 0.472727
5 B 4800 20 0.436364
6 B 501 20 0.045545
7 B 499 20 0.045364
And this function that uses BOTH the 'Ratio' and 'Part' column that I'd like to apply to each 'Group':
def allocation(df, ratio, part):
k = df[part].max()
frac, results = np.array(np.modf(k * df[ratio]))
remainder = int(k - results.sum())
indices = np.argsort(frac)[::-1]
results[indices[0:remainder]] += 1
return results.astype(int)
Notice that the difference between my function and the function shown in the question I referred to at the top is that my function returns an array of values for the whole group instead of a single value. I tried the following:
data.groupby('Group', group_keys=False).apply(allocation, ratio='Ratio', part='Part')
Out[67]:
Group
A [6, 2, 1, 1]
B [9, 9, 1, 1]
dtype: object
These numbers are correct. However, I need the output to be a series that I can assign back into the original dataframe, so that it would look something like this:
Group Value Part Ratio Allocate
0 A 6373 10 0.637300 6
1 A 2512 10 0.251200 2
2 A 603 10 0.060300 1
3 A 512 10 0.051200 1
4 B 5200 20 0.472727 9
5 B 4800 20 0.436364 9
6 B 501 20 0.045545 1
7 B 499 20 0.045364 1
How would I go about doing this? Is using apply the correct approach?
To do it in pandas way, you can have the allocation function return a DataFrame or Series:
def allocation(df, ratio, part):
k = df[part].max()
frac, results = np.array(np.modf(k * df[ratio]))
remainder = int(k - results.sum())
indices = np.argsort(frac)[::-1]
results[indices[0:remainder]] += 1
df['Allocate'] = results.astype(int)
return df
Then groupby.apply will directly give what you want
In [61]: df.groupby('Group', group_keys=False).apply(allocation, ratio='Ratio', part='Part')
Out[61]:
Group Value Part Ratio Allocate
0 A 6373 10 0.6373 6
1 A 2512 10 0.2512 2
2 A 603 10 0.0603 1
3 A 512 10 0.0512 1
4 B 5200 20 0.4727 9
5 B 4800 20 0.4364 9
6 B 501 20 0.0455 1
7 B 499 20 0.0454 1
This works even if the original dataframe is not sorted by the Group.
Try it on df2 = pd.concat([df.iloc[:2], df.iloc[6:], df.iloc[2:6]])
It usually happen when using apply with self-def function , we can fix it by using concatenate
s=df.groupby('Group', group_keys=False).apply(allocation, ratio='Ratio', part='Part').values
df['Allocate']=np.concatenate(s)
df
Out[71]:
Group Value Part Ratio Allocate
0 A 6373 10 0.637300 6
1 A 2512 10 0.251200 2
2 A 603 10 0.060300 1
3 A 512 10 0.051200 1
4 B 5200 20 0.472727 9
5 B 4800 20 0.436364 9
6 B 501 20 0.045545 1
7 B 499 20 0.045364 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With