I can't figure out the difference between Pandas .aggregate
and .apply
functions.
Take the following as an example: I load a dataset, do a groupby
, define a simple function, and either user .agg
or .apply
.
As you may see, the printing statement within my function results in the same output after using .agg
and .apply
. The result, on the other hand is different. Why is that?
import pandas import pandas as pd iris = pd.read_csv('iris.csv') by_species = iris.groupby('Species') def f(x): ...: print type(x) ...: print x.head(3) ...: return 1
Using apply
:
by_species.apply(f) #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #0 5.1 3.5 1.4 0.2 setosa #1 4.9 3.0 1.4 0.2 setosa #2 4.7 3.2 1.3 0.2 setosa #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #0 5.1 3.5 1.4 0.2 setosa #1 4.9 3.0 1.4 0.2 setosa #2 4.7 3.2 1.3 0.2 setosa #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #50 7.0 3.2 4.7 1.4 versicolor #51 6.4 3.2 4.5 1.5 versicolor #52 6.9 3.1 4.9 1.5 versicolor #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #100 6.3 3.3 6.0 2.5 virginica #101 5.8 2.7 5.1 1.9 virginica #102 7.1 3.0 5.9 2.1 virginica #Out[33]: #Species #setosa 1 #versicolor 1 #virginica 1 #dtype: int64
Using agg
by_species.agg(f) #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #0 5.1 3.5 1.4 0.2 setosa #1 4.9 3.0 1.4 0.2 setosa #2 4.7 3.2 1.3 0.2 setosa #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #50 7.0 3.2 4.7 1.4 versicolor #51 6.4 3.2 4.5 1.5 versicolor #52 6.9 3.1 4.9 1.5 versicolor #<class 'pandas.core.frame.DataFrame'> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #100 6.3 3.3 6.0 2.5 virginica #101 5.8 2.7 5.1 1.9 virginica #102 7.1 3.0 5.9 2.1 virginica #Out[34]: # Sepal.Length Sepal.Width Petal.Length Petal.Width #Species #setosa 1 1 1 1 #versicolor 1 1 1 1 #virginica 1 1 1 1
agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.
agg is an alias for aggregate . Use the alias. Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.
What is the difference between map(), applymap() and apply() methods in pandas? – In padas, all these methods are used to perform either to modify the DataFrame or Series. map() is a method of Series, applymap() is a method of DataFrame, and apply() is defined in both DataFrame and Series.
apply
applies the function to each group (your Species
). Your function returns 1, so you end up with 1 value for each of 3 groups.
agg
aggregates each column (feature) for each group, so you end up with one value per column per group.
Do read the groupby
docs, they're quite helpful. There are also a bunch of tutorials floating around the web.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With