Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: GroupBy .pipe() vs .apply()

In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results.

In [195]: import numpy as np  In [196]: n = 1000  In [197]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),    .....:                    'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),    .....:                    'Revenue': (np.random.random(n)*50+10).round(2),    .....:                    'Quantity': np.random.randint(1, 10, size=n)})  In [199]: (df.groupby(['Store', 'Product'])    .....:    .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())    .....:    .unstack().round(2))  Out[199]:  Product  Product_1  Product_2  Product_3 Store                                    Store_1       6.93       6.82       7.15 Store_2       6.69       6.64       6.77 

I can see how the pipe functionality differs from apply for DataFrame objects, but not for GroupBy objects. Does anyone have an explanation or examples of what can be done with pipe but not with apply for a GroupBy?

like image 276
foglerit Avatar asked Nov 10 '17 15:11

foglerit


People also ask

What is the use of pipe () in Python pandas?

Pandas introduced pipe() starting from version 0.16. 2. pipe() enables user-defined methods in method chains. Method chaining is a programmatic style of invoking multiple method calls sequentially with each call performing an action on the same object and returning it.

How does pandas Groupby apply work?

The function passed to apply must take a dataframe as its first argument and return a dataframe, a series or a scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.

Which of the following is an element wise function application in pandas pipe () apply () Applymap () none of these?

Element-wise Function Application: applymap() This Pandas function application is used to apply a function to DataFrame, that accepts and returns only one scalar value to every element of the DataFrame. It is a Data-centric method of applying functions to DataFrames.

When should I use a Groupby in pandas?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.


1 Answers

What pipe does is to allow you to pass a callable with the expectation that the object that called pipe is the object that gets passed to the callable.

With apply we assume that the object that calls apply has subcomponents that will each get passed to the callable that was passed to apply. In the context of a groupby the subcomponents are slices of the dataframe that called groupby where each slice is a dataframe itself. This is analogous for a series groupby.

The main difference between what you can do with a pipe in a groupby context is that you have available to the callable the entire scope of the the groupby object. For apply, you only know about the local slice.

Setup
Consider df

df = pd.DataFrame(dict(     A=list('XXXXYYYYYY'),     B=range(10) ))     A  B 0  X  0 1  X  1 2  X  2 3  X  3 4  Y  4 5  Y  5 6  Y  6 7  Y  7 8  Y  8 9  Y  9 

Example 1
Make the entire 'B' column sum to 1 while each sub-group sums to the same amount. This requires that the calculation be aware of how many groups exist. This is something we can't do with apply because apply wouldn't know how many groups exist.

s = df.groupby('A').B.pipe(lambda g: df.B / g.transform('sum') / g.ngroups) s  0    0.000000 1    0.083333 2    0.166667 3    0.250000 4    0.051282 5    0.064103 6    0.076923 7    0.089744 8    0.102564 9    0.115385 Name: B, dtype: float64 

Note:

s.sum()  0.99999999999999989 

And:

s.groupby(df.A).sum()  A X    0.5 Y    0.5 Name: B, dtype: float64 

Example 2
Subtract the mean of one group from the values of another. Again, this can't be done with apply because apply doesn't know about other groups.

df.groupby('A').B.pipe(     lambda g: (         g.get_group('X') - g.get_group('Y').mean()     ).append(         g.get_group('Y') - g.get_group('X').mean()     ) )  0   -6.5 1   -5.5 2   -4.5 3   -3.5 4    2.5 5    3.5 6    4.5 7    5.5 8    6.5 9    7.5 Name: B, dtype: float64 
like image 66
piRSquared Avatar answered Sep 18 '22 04:09

piRSquared