Passing columns as arguments to pandas groupby apply function

Question

Let say I have the following dataframe:

a = np.random.rand(10)
b = np.random.rand(10)*10
c = np.random.rand(10)*100
groups = np.array([1,1,2,2,2,2,3,3,4,4])
df = pd.DataFrame({"a":a,"b":b,"c":c,"groups":groups})

I simply want to group by the df based on groups and apply the following function to two columns (a and b) of each group:

def my_fun(x,y):
    tmp =  np.sum((x*y))/np.sum(y)
    return tmp

What I tried is:

df.groupby("groups").apply(my_fun,("a","b"))

But that does not work and gives me error:

ValueError: Unable to coerce to Series, the length must be 4: given 2

The final output is basically a single number for each group. I can get around the problem by loops but I think there should be a better approach?

Thanks

Quang Hoang · Accepted Answer

Without changing your function, you want to do:

df.groupby("groups").apply(lambda d: my_fun(d["a"],d["b"]))

Output:

groups
1    0.603284
2    0.183289
3    0.828273
4    0.361103
dtype: float64

That said, you can rewrite your function so it takes in a dataframe as the first positional argument:

def myfunc(data, val_col, weight_col):
    return np.sum(data[val_col]*data[weight_col])/np.sum(data[weight_col])

df.groupby('groups').apply(myfunc, 'a', 'b')

Passing columns as arguments to pandas groupby apply function

Tags:

python

pandas

pandas-groupby

Ress

1 Answers

Quang Hoang

Recent Activity

Donate For Us

Passing columns as arguments to pandas groupby apply function

Tags:

python

pandas

pandas-groupby

Ress

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us