Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between transform vs applymap for pandas DataFrame

While the applymap function on DataFrame operates element-wise, the transform function seems to achieve the same thing except claiming to return a like-indexed DataFrame.

Questions:

  1. Is there any use case where one of them works and the other doesn't?
  2. Does one have better performance than the other?
  3. What's a like-indexed DataFrame stated in the documentation?
like image 454
darcyy Avatar asked Sep 14 '17 04:09

darcyy


People also ask

What does Applymap do in pandas?

The applymap() function is used to apply a function to a Dataframe elementwise. This method applies a function that accepts and returns a scalar to every element of a DataFrame. Python function, returns a single value from a single value.

What is the difference between transform and apply pandas?

transform() can take a function, a string function, a list of functions, and a dict. However, apply() is only allowed a function. apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.

What is the difference between apply and Applymap in pandas?

apply() is used to apply a function along an axis of the DataFrame or on values of Series. applymap() is used to apply a function to a DataFrame elementwise. map() is used to substitute each value in a Series with another value.

How are Agg () and transform () similar and different?

Compared with Aggregation, transform takes an additional step called “Broadcasting”. It broadcasts the results from sub dataframes to the original full dataframe. You could view it as left merge the results to original full dataframe.


1 Answers

Different use cases. When comparing them, it is useful to bring up apply and agg as well.

Setup

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(6, 4)), columns=list('ABCD'))

df

   A  B  C  D
0  0  2  7  3
1  8  7  0  6
2  8  6  0  2
3  0  4  9  7
4  3  2  4  3
5  3  6  7  7

pd.DataFrame.applymap
This takes a function and returns a new dataframe with the results of that function being applied to the value in each cell and replacing the value of the cell with the result.

df.applymap(lambda x: str(x) * x)

          A        B          C        D
0                 22    7777777      333
1  88888888  7777777              666666
2  88888888   666666                  22
3               4444  999999999  7777777
4       333       22       4444      333
5       333   666666    7777777  7777777

pd.DataFrame.agg
Takes one or more functions. Each function is expected to be an aggregation function. Meaning each function is applied to each column and is expected to return a single value that replaces the entire column. Examples would be 'mean' or 'max'. Both of those take a set of data and return a scalar.

df.agg('mean')

A    3.666667
B    4.500000
C    4.500000
D    4.666667
dtype: float64

Or

df.agg(['mean', 'std', 'first', 'min'])

             A         B         C         D
mean  3.666667  4.500000  4.500000  4.666667
std   3.614784  2.167948  3.834058  2.250926
min   0.000000  2.000000  0.000000  2.000000

pd.DataFrame.transform
Takes one function that is expected to be applied to a column and return a column of equal size.

df.transform(lambda x: x / x.std())

          A         B         C         D
0  0.000000  0.922531  1.825742  1.332785
1  2.213133  3.228859  0.000000  2.665570
2  2.213133  2.767594  0.000000  0.888523
3  0.000000  1.845062  2.347382  3.109832
4  0.829925  0.922531  1.043281  1.332785
5  0.829925  2.767594  1.825742  3.109832

pd.DataFrame.apply
pandas attempts to figure out if apply is reducing the dimensionality of the column it was operating on (aka, aggregation) or if it is transforming the column into another column of equal size. When it figures it out, it runs the remainder of the operation as if it were an aggregation or transform procedure.

df.apply('mean')

A    3.666667
B    4.500000
C    4.500000
D    4.666667
dtype: float64

Or

df.apply(lambda x: (x - x.mean()) / x.std())

          A         B         C         D
0 -1.014353 -1.153164  0.652051 -0.740436
1  1.198781  1.153164 -1.173691  0.592349
2  1.198781  0.691898 -1.173691 -1.184698
3 -1.014353 -0.230633  1.173691  1.036611
4 -0.184428 -1.153164 -0.130410 -0.740436
5 -0.184428  0.691898  0.652051  1.036611
like image 189
piRSquared Avatar answered Oct 23 '22 15:10

piRSquared