Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between pandas agg and apply function?

I can't figure out the difference between Pandas .aggregate and .apply functions.
Take the following as an example: I load a dataset, do a groupby, define a simple function, and either user .agg or .apply.

As you may see, the printing statement within my function results in the same output after using .agg and .apply. The result, on the other hand is different. Why is that?

import pandas import pandas as pd iris = pd.read_csv('iris.csv') by_species = iris.groupby('Species') def f(x):     ...:     print type(x)     ...:     print x.head(3)     ...:     return 1 

Using apply:

by_species.apply(f) #<class 'pandas.core.frame.DataFrame'> #   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species #0           5.1          3.5           1.4          0.2  setosa #1           4.9          3.0           1.4          0.2  setosa #2           4.7          3.2           1.3          0.2  setosa #<class 'pandas.core.frame.DataFrame'> #   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species #0           5.1          3.5           1.4          0.2  setosa #1           4.9          3.0           1.4          0.2  setosa #2           4.7          3.2           1.3          0.2  setosa #<class 'pandas.core.frame.DataFrame'> #    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width     Species #50           7.0          3.2           4.7          1.4  versicolor #51           6.4          3.2           4.5          1.5  versicolor #52           6.9          3.1           4.9          1.5  versicolor #<class 'pandas.core.frame.DataFrame'> #     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species #100           6.3          3.3           6.0          2.5  virginica #101           5.8          2.7           5.1          1.9  virginica #102           7.1          3.0           5.9          2.1  virginica #Out[33]:  #Species #setosa        1 #versicolor    1 #virginica     1 #dtype: int64 

Using agg

by_species.agg(f) #<class 'pandas.core.frame.DataFrame'> #   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species #0           5.1          3.5           1.4          0.2  setosa #1           4.9          3.0           1.4          0.2  setosa #2           4.7          3.2           1.3          0.2  setosa #<class 'pandas.core.frame.DataFrame'> #    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width     Species #50           7.0          3.2           4.7          1.4  versicolor #51           6.4          3.2           4.5          1.5  versicolor #52           6.9          3.1           4.9          1.5  versicolor #<class 'pandas.core.frame.DataFrame'> #     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species #100           6.3          3.3           6.0          2.5  virginica #101           5.8          2.7           5.1          1.9  virginica #102           7.1          3.0           5.9          2.1  virginica #Out[34]:  #           Sepal.Length  Sepal.Width  Petal.Length  Petal.Width #Species                                                          #setosa                 1            1             1            1 #versicolor             1            1             1            1 #virginica              1            1             1            1 
like image 473
David D Avatar asked Feb 17 '14 11:02

David D


People also ask

What does AGG function do in Pandas?

agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.

What does AGG mean in Pandas?

agg is an alias for aggregate . Use the alias. Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

What is the use of apply function in Pandas?

The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.

What is the difference between apply and Applymap in Pandas?

What is the difference between map(), applymap() and apply() methods in pandas? – In padas, all these methods are used to perform either to modify the DataFrame or Series. map() is a method of Series, applymap() is a method of DataFrame, and apply() is defined in both DataFrame and Series.


1 Answers

apply applies the function to each group (your Species). Your function returns 1, so you end up with 1 value for each of 3 groups.

agg aggregates each column (feature) for each group, so you end up with one value per column per group.

Do read the groupby docs, they're quite helpful. There are also a bunch of tutorials floating around the web.

like image 62
TomAugspurger Avatar answered Oct 15 '22 04:10

TomAugspurger