Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby Apply Custom Function Pandas

I'm trying to apply a custom function in pandas similar to the groupby and mutate functionality in dplyr.

What I'm trying to do is say given a pandas dataframe like this:

df = pd.DataFrame({'category1':['a','a','a', 'b', 'b','b'],
  'category2':['a', 'b', 'a', 'b', 'a', 'b'],
  'var1':np.random.randint(0,100,6),
  'var2':np.random.randint(0,100,6)}
)

df
  category1 category2  var1  var2
0         a         a    23    59
1         a         b    54    20
2         a         a    48    62
3         b         b    45    76
4         b         a    60    26
5         b         b    13    70

apply some function that returns the same number of elements as the number of elements in the group by:

def myfunc(s):
  return [np.mean(s)] * len(s)

to get this result

df
  category1 category2  var1  var2   var3
0         a         a    23    59   35.5
1         a         b    54    20   54
2         a         a    48    62   35.5
3         b         b    45    76   29
4         b         a    60    26   60
5         b         b    13    70   29

I was thinking of something along the lines of:

df['var3'] = df.groupby(['category1', 'category2'], group_keys=False).apply(lambda x: myfunc(x.var1))

but haven't been able to get the index to match.

In R with dplyr this would be

df <- df %>%
  group_by(category1, category2) %>%
  mutate(
    var3 = myfunc(var1)
  )

So I was able to solve it by using a custom function like:

def myfunc_data(data):

  data['var3'] = myfunc(data.var1)
  return data

and

df = df.groupby(['category1', 'category2']).apply(myfunc_data)

but I guess I was still wondering if there's a way to do it without defining this custom function.

like image 407
jtanman Avatar asked Apr 12 '19 04:04

jtanman


People also ask

How do I use custom function on Groupby pandas?

Simply use the apply method to each dataframe in the groupby object. This is the most straightforward way and the easiest to understand. Notice that the function takes a dataframe as its only argument, so any code within the custom function needs to work on a pandas dataframe.

How do you apply a function to each group of a data frame?

Apply function func group-wise and combine the results together. The function passed to apply must take a dataframe as its first argument and return a dataframe, a series or a scalar. apply will then take care of combining the results back together into a single dataframe or series.

How do you apply a user defined function to a DataFrame in Python?

There are generally 3 ways to apply custom functions in Pandas: map , apply , and applymap . map works element-wise on a series, and is optimized for mapping values to a series (e.g. one column of a DataFrame). applymap works element-wise on a DataFrame, and is optimized for mapping values to a DataFrame.

What does Groupby function do in pandas?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.


1 Answers

Use GroupBy.transform for return Series with same size like original DataFrame, so possible assign to new column:

np.random.seed(123)

df = pd.DataFrame({'category1':['a','a','a', 'b', 'b','b'],
  'category2':['a', 'b', 'a', 'b', 'a', 'b'],
  'var1':np.random.randint(0,100,6),
  'var2':np.random.randint(0,100,6)}
)

df['var3'] = df.groupby(['category1', 'category2'])['var1'].transform(myfunc)
print (df)
  category1 category2  var1  var2  var3
0         a         a    66    86    82
1         a         b    92    97    92
2         a         a    98    96    82
3         b         b    17    47    37
4         b         a    83    73    83
5         b         b    57    32    37

Alternative with lambda function:

df['var3'] = (df.groupby(['category1', 'category2'])['var1']
                .transform(lambda s: [np.mean(s)] * len(s)))
like image 59
jezrael Avatar answered Sep 29 '22 10:09

jezrael