I would like to use df.groupby()
in combination with apply()
to apply a function to each row per group.
I normally use the following code, which usually works (note, that this is without groupby()
):
df.apply(myFunction, args=(arg1,))
With the groupby()
I tried the following:
df.groupby('columnName').apply(myFunction, args=(arg1,))
However, I get the following error:
TypeError: myFunction() got an unexpected keyword argument 'args'
Hence, my question is: How can I use groupby()
and apply()
with a function that needs arguments?
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
The Hello, World! of pandas GroupBy groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to . groupby() as the first argument.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
GroupBy objects are returned by groupby calls: pandas.DataFrame.groupby (), pandas.Series.groupby (), etc. Groupby iterator. Dict {group name -> group labels}. Dict {group name -> group indices}.
Show activity on this post. Usually when using the .apply () method, one passes a function that takes exactly one argument. def somefunction (group): group ['ColumnC'] == group ['ColumnC']**2 return group df.groupby ( ['ColumnA', 'ColumnB']).apply (somefunction) Here somefunction is applied for each group, which is then returned.
In pandas perception, the groupby () process holds a classified number of parameters to control its operation. Syntax and Parameters of Pandas DataFrame.groupby ():
Usually when using the .apply () method, one passes a function that takes exactly one argument. def somefunction (group): group ['ColumnC'] == group ['ColumnC']**2 return group df.groupby ( ['ColumnA', 'ColumnB']).apply (somefunction)
pandas.core.groupby.GroupBy.apply
does NOT have named parameter args
, but pandas.DataFrame.apply
does have it.
So try this:
df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
or as suggested by @Zero:
df.groupby('columnName').apply(myFunction, ('arg1'))
Demo:
In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc')) In [83]: df Out[83]: a b c 0 0 3 1 1 0 3 4 2 3 0 4 3 4 2 3 4 3 4 1 In [84]: def f(ser, n): ...: return ser.max() * n ...: In [85]: df.apply(f, args=(10,)) Out[85]: a 40 b 40 c 40 dtype: int64
when using GroupBy.apply
you can pass either a named arguments:
In [86]: df.groupby('a').apply(f, n=10) Out[86]: a b c a 0 0 30 40 3 30 40 40 4 40 20 30
a tuple of arguments:
In [87]: df.groupby('a').apply(f, (10)) Out[87]: a b c a 0 0 30 40 3 30 40 40 4 40 20 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With