Python Pandas groupby apply lambda arguments

Tags:

In a coursera video about Python Pandas groupby (in the Introduction to Data Science in Python course) the following example is given:

Click to copy

df.groupby('Category').apply(lambda df,a,b: sum(df[a] * df[b]), 'Weight (oz.)', 'Quantity')

Where df is a DataFrame, and the lambda is applied to calculate the sum of two columns. If I understand correctly, the groupby object (returned by groupby) that the apply function is called on is a series of tuples consisting of the index that was grouped by and the part of the DataFrame that is that specific grouping.

What I don't understand is the way that the lambda is used:

There are three arguments specified (lambda df,a,b), but only two are explicitly passed ('Weight (oz.)' and 'Quantity'). How does the interpreter know that arguments 'a' and 'b' are the ones specified as arguments and df is used 'as-is'?

I have looked at the docs but could not find a definitive answer for such a specific example. I am thinking this has to do something with df being in scope but cannot find information to support and detail that thought.

800

asked Nov 29 '17 11:11

g_uint

1 Answers

The apply method itself passes each "group" of the groupby object as the first argument to the function. So it knows to associate 'Weight' and "Quantity" to a and b based on position. (eg they are the 2nd and 3rd arguments if you count the first "group" argument.

Click to copy

df = pd.DataFrame(np.random.randint(0,11,(10,3)), columns = ['num1','num2','num3'])
df['category'] = ['a','a','a','b','b','b','b','c','c','c']
df = df[['category','num1','num2','num3']]
df

  category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4
3        b    10     9     1
4        b     4     7     6
5        b     0     5     2
6        b     7     7     5
7        c     2     2     1
8        c     4     3     2
9        c     1     4     6

gb = df.groupby('category')

implicit argument is each "group" or in this case each category

Click to copy

gb.apply(lambda grp: grp.sum())

The "grp" is the first argument to the lambda function notice I don't have to specify anything for it as it is already, automatically taken to be each group of the groupby object

Click to copy

         category  num1  num2  num3
category                           
a             aaa    14    13     8
b            bbbb    21    28    14
c             ccc     7     9     9

So apply goes through each of these and performs a sum operation

Click to copy

print(gb.groups)
{'a': Int64Index([0, 1, 2], dtype='int64'), 'b': Int64Index([3, 4, 5, 6], dtype='int64'), 'c': Int64Index([7, 8, 9], dtype='int64')}

print('1st GROUP:\n', df.loc[gb.groups['a']])
1st GROUP:
  category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4    


print('SUM of 1st group:\n', df.loc[gb.groups['a']].sum())

SUM of 1st group:
category    aaa
num1         14
num2         13
num3          8
dtype: object

Notice how this is the same as the first row of our previous operation

So apply is implicitly passing each group to the function argument as the first argument.

From the docs

GroupBy.apply(func, *args, **kwargs)

args, kwargs : tuple and dict

Optional positional and keyword arguments to pass to func

Additional Args passed in "*args" get passed after the implict group argument.

so using your code

Click to copy

gb.apply(lambda df,a,b: sum(df[a] * df[b]), 'num1', 'num2')

category
a     56
b    167
c     20
dtype: int64

here 'num1' and 'num2' are being passed as additional arguments to each call of the lambda function

So apply goes through each of these and performs your lambda operation

Click to copy

# copy and paste your lambda function
fun = lambda df,a,b: sum(df[a] * df[b])

print(gb.groups)
{'a': Int64Index([0, 1, 2], dtype='int64'), 'b': Int64Index([3, 4, 5, 6], dtype='int64'), 'c': Int64Index([7, 8, 9], dtype='int64')}

print('1st GROUP:\n', df.loc[gb.groups['a']])

1st GROUP:
   category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4

print('Output of 1st group for function "fun":\n', 
fun(df.loc[gb.groups['a']], 'num1','num2'))

Output of 1st group for function "fun":
56

179

answered Sep 27 '22 22:09

RSHAP

Related questions
                            
                                Data order in seaborn heatmap from pivot
                            
                                How to change page size to A4 in python-docx
                            
                                How to round float 0.5 up to 1.0, while still rounding 0.45 to 0.0, as the usual school rounding?
                            
                                Using scikit-learn NMF with a precomputed set of basis vectors (Python)
                            
                                Can a PyMC3 trace be loaded and values accessed without the original model in memory?
                            
                                TensorFlow - tf.layers vs tf.contrib.layers
                            
                                Index out of range when using lambda [duplicate]
                            
                                Pandas - Groupby with conditional formula
                            
                                Improve performance of converting numpy array to MATLAB double
                            
                                Python static method is not always callable
                            
                                Setup in virtualenv: `pip install -e .` vs `python setup.py install`
                            
                                Sorting a list: numbers in ascending, letters in descending
                            
                                Merge MultiIndex columns together into 1 level [duplicate]
                            
                                Python Keras LSTM learning converges too fast on high loss
                            
                                python -docx to extract table from word docx
                            
                                How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?
                            
                                Numpy: assigning values to 2d array with list of indices
                            
                                Django - Supervisor : exited too quickly
                            
                                How to setup working directory in VS Code for pylint?
                            
                                Find locations on a curve where the slope changes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas groupby apply lambda arguments

Tags:

python

pandas

lambda

pandas-groupby

g_uint

People also ask

1 Answers

RSHAP

Recent Activity

Donate For Us