While the applymap
function on DataFrame operates element-wise, the transform
function seems to achieve the same thing except claiming to return a like-indexed DataFrame.
Questions:
The applymap() function is used to apply a function to a Dataframe elementwise. This method applies a function that accepts and returns a scalar to every element of a DataFrame. Python function, returns a single value from a single value.
transform() can take a function, a string function, a list of functions, and a dict. However, apply() is only allowed a function. apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.
apply() is used to apply a function along an axis of the DataFrame or on values of Series. applymap() is used to apply a function to a DataFrame elementwise. map() is used to substitute each value in a Series with another value.
Compared with Aggregation, transform takes an additional step called “Broadcasting”. It broadcasts the results from sub dataframes to the original full dataframe. You could view it as left merge the results to original full dataframe.
Different use cases. When comparing them, it is useful to bring up apply
and agg
as well.
Setup
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(6, 4)), columns=list('ABCD'))
df
A B C D
0 0 2 7 3
1 8 7 0 6
2 8 6 0 2
3 0 4 9 7
4 3 2 4 3
5 3 6 7 7
pd.DataFrame.applymap
This takes a function and returns a new dataframe with the results of that function being applied to the value in each cell and replacing the value of the cell with the result.
df.applymap(lambda x: str(x) * x)
A B C D
0 22 7777777 333
1 88888888 7777777 666666
2 88888888 666666 22
3 4444 999999999 7777777
4 333 22 4444 333
5 333 666666 7777777 7777777
pd.DataFrame.agg
Takes one or more functions. Each function is expected to be an aggregation function. Meaning each function is applied to each column and is expected to return a single value that replaces the entire column. Examples would be 'mean'
or 'max'
. Both of those take a set of data and return a scalar.
df.agg('mean')
A 3.666667
B 4.500000
C 4.500000
D 4.666667
dtype: float64
Or
df.agg(['mean', 'std', 'first', 'min'])
A B C D
mean 3.666667 4.500000 4.500000 4.666667
std 3.614784 2.167948 3.834058 2.250926
min 0.000000 2.000000 0.000000 2.000000
pd.DataFrame.transform
Takes one function that is expected to be applied to a column and return a column of equal size.
df.transform(lambda x: x / x.std())
A B C D
0 0.000000 0.922531 1.825742 1.332785
1 2.213133 3.228859 0.000000 2.665570
2 2.213133 2.767594 0.000000 0.888523
3 0.000000 1.845062 2.347382 3.109832
4 0.829925 0.922531 1.043281 1.332785
5 0.829925 2.767594 1.825742 3.109832
pd.DataFrame.apply
pandas attempts to figure out if apply
is reducing the dimensionality of the column it was operating on (aka, aggregation) or if it is transforming the column into another column of equal size. When it figures it out, it runs the remainder of the operation as if it were an aggregation or transform procedure.
df.apply('mean')
A 3.666667
B 4.500000
C 4.500000
D 4.666667
dtype: float64
Or
df.apply(lambda x: (x - x.mean()) / x.std())
A B C D
0 -1.014353 -1.153164 0.652051 -0.740436
1 1.198781 1.153164 -1.173691 0.592349
2 1.198781 0.691898 -1.173691 -1.184698
3 -1.014353 -0.230633 1.173691 1.036611
4 -0.184428 -1.153164 -0.130410 -0.740436
5 -0.184428 0.691898 0.652051 1.036611
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With