Pandas groupby custom function to each series

Question

I am having hard time to apply a custom function to each set of groupby column in Pandas

My custom function takes series of numbers and takes the difference of consecutive pairs and returns the mean of all the differences. Below is the code

def mean_gap(a):
    b = []
    for i in range(0, len(a)-1):
        b.append((a[i+1]-a[i]))
    return np.mean(b)

so if a = [1,3,7] mean_gap(a) will give me ((3-1)+(7-3))/2) = 3.0

 Dataframe:
   one two
    a  1
    a  3
    a  7
    b  8
    b  9

desired result
     Dataframe:
       one two
        a  3
        b  1

df.groupby(['one'])['two'].???

I am new to pandas. I read that groupby takes values each row at a time, not full series. So I am not able to use lambda after groupby. Please help!

ayhan · Accepted Answer

With a custom function, you can do:

df.groupby('one')['two'].agg(lambda x: x.diff().mean())
one
a    3
b    1
Name: two, dtype: int64

and reset the index:

df.groupby('one')['two'].agg(lambda x: x.diff().mean()).reset_index(name='two')


    one  two
0   a    3
1   b    1

An alternative would be:

df.groupby('one')['two'].diff().groupby(df['one']).mean()
one
a    3.0
b    1.0
Name: two, dtype: float64

Your approach would have also worked with the following:

def mean_gap(a):
    b = []
    a = np.asarray(a)
    for i in range(0, len(a)-1):
        b.append((a[i+1]-a[i]))
    return np.mean(b) 

df.groupby('one')['two'].agg(mean_gap)
one
a    3
b    1
Name: two, dtype: int64

a = np.asarray(a) is necessary because otherwise you would get KeyErrors in b.append((a[i+1]-a[i])).

Pandas groupby custom function to each series

Tags:

python

pandas

group-by

numpy

Naresh Ambati

Video Answer

1 Answers

ayhan

Recent Activity

Donate For Us

Pandas groupby custom function to each series

Tags:

python

pandas

group-by

numpy

Naresh Ambati

Video Answer

1 Answers

ayhan

Related questions

Recent Activity

Donate For Us