In a coursera video about Python Pandas groupby (in the Introduction to Data Science in Python course) the following example is given:
df.groupby('Category').apply(lambda df,a,b: sum(df[a] * df[b]), 'Weight (oz.)', 'Quantity')
Where df is a DataFrame, and the lambda is applied to calculate the sum of two columns. If I understand correctly, the groupby object (returned by groupby) that the apply function is called on is a series of tuples consisting of the index that was grouped by and the part of the DataFrame that is that specific grouping.
What I don't understand is the way that the lambda is used:
There are three arguments specified (lambda df,a,b), but only two are explicitly passed ('Weight (oz.)' and 'Quantity'). How does the interpreter know that arguments 'a' and 'b' are the ones specified as arguments and df is used 'as-is'?
I have looked at the docs but could not find a definitive answer for such a specific example. I am thinking this has to do something with df being in scope but cannot find information to support and detail that thought.
Pandas dataframe has groupby([column(s)]). first() method which is used to get the first record from each group. The result of grouby.
What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.
Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately.
The apply method itself passes each "group" of the groupby object as the first argument to the function. So it knows to associate 'Weight' and "Quantity" to a
and b
based on position. (eg they are the 2nd and 3rd arguments if you count the first "group" argument.
df = pd.DataFrame(np.random.randint(0,11,(10,3)), columns = ['num1','num2','num3'])
df['category'] = ['a','a','a','b','b','b','b','c','c','c']
df = df[['category','num1','num2','num3']]
df
category num1 num2 num3
0 a 2 5 2
1 a 5 5 2
2 a 7 3 4
3 b 10 9 1
4 b 4 7 6
5 b 0 5 2
6 b 7 7 5
7 c 2 2 1
8 c 4 3 2
9 c 1 4 6
gb = df.groupby('category')
implicit argument is each "group" or in this case each category
gb.apply(lambda grp: grp.sum())
The "grp" is the first argument to the lambda function notice I don't have to specify anything for it as it is already, automatically taken to be each group of the groupby object
category num1 num2 num3
category
a aaa 14 13 8
b bbbb 21 28 14
c ccc 7 9 9
So apply goes through each of these and performs a sum operation
print(gb.groups)
{'a': Int64Index([0, 1, 2], dtype='int64'), 'b': Int64Index([3, 4, 5, 6], dtype='int64'), 'c': Int64Index([7, 8, 9], dtype='int64')}
print('1st GROUP:\n', df.loc[gb.groups['a']])
1st GROUP:
category num1 num2 num3
0 a 2 5 2
1 a 5 5 2
2 a 7 3 4
print('SUM of 1st group:\n', df.loc[gb.groups['a']].sum())
SUM of 1st group:
category aaa
num1 14
num2 13
num3 8
dtype: object
Notice how this is the same as the first row of our previous operation
So apply is implicitly passing each group to the function argument as the first argument.
From the docs
GroupBy.apply(func, *args, **kwargs)
args, kwargs : tuple and dict
Optional positional and keyword arguments to pass to func
Additional Args passed in "*args" get passed after the implict group argument.
so using your code
gb.apply(lambda df,a,b: sum(df[a] * df[b]), 'num1', 'num2')
category
a 56
b 167
c 20
dtype: int64
here 'num1' and 'num2' are being passed as additional arguments to each call of the lambda function
So apply goes through each of these and performs your lambda operation
# copy and paste your lambda function
fun = lambda df,a,b: sum(df[a] * df[b])
print(gb.groups)
{'a': Int64Index([0, 1, 2], dtype='int64'), 'b': Int64Index([3, 4, 5, 6], dtype='int64'), 'c': Int64Index([7, 8, 9], dtype='int64')}
print('1st GROUP:\n', df.loc[gb.groups['a']])
1st GROUP:
category num1 num2 num3
0 a 2 5 2
1 a 5 5 2
2 a 7 3 4
print('Output of 1st group for function "fun":\n',
fun(df.loc[gb.groups['a']], 'num1','num2'))
Output of 1st group for function "fun":
56
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With