What's the difference between transform vs applymap for pandas DataFrame

1 Answers

Different use cases. When comparing them, it is useful to bring up apply and agg as well.

Setup

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(6, 4)), columns=list('ABCD'))

df

   A  B  C  D
0  0  2  7  3
1  8  7  0  6
2  8  6  0  2
3  0  4  9  7
4  3  2  4  3
5  3  6  7  7

pd.DataFrame.applymap
This takes a function and returns a new dataframe with the results of that function being applied to the value in each cell and replacing the value of the cell with the result.

df.applymap(lambda x: str(x) * x)

          A        B          C        D
0                 22    7777777      333
1  88888888  7777777              666666
2  88888888   666666                  22
3               4444  999999999  7777777
4       333       22       4444      333
5       333   666666    7777777  7777777

pd.DataFrame.agg
Takes one or more functions. Each function is expected to be an aggregation function. Meaning each function is applied to each column and is expected to return a single value that replaces the entire column. Examples would be 'mean' or 'max'. Both of those take a set of data and return a scalar.

df.agg('mean')

A    3.666667
B    4.500000
C    4.500000
D    4.666667
dtype: float64

df.agg(['mean', 'std', 'first', 'min'])

             A         B         C         D
mean  3.666667  4.500000  4.500000  4.666667
std   3.614784  2.167948  3.834058  2.250926
min   0.000000  2.000000  0.000000  2.000000

pd.DataFrame.transform
Takes one function that is expected to be applied to a column and return a column of equal size.

df.transform(lambda x: x / x.std())

          A         B         C         D
0  0.000000  0.922531  1.825742  1.332785
1  2.213133  3.228859  0.000000  2.665570
2  2.213133  2.767594  0.000000  0.888523
3  0.000000  1.845062  2.347382  3.109832
4  0.829925  0.922531  1.043281  1.332785
5  0.829925  2.767594  1.825742  3.109832

pd.DataFrame.apply
pandas attempts to figure out if apply is reducing the dimensionality of the column it was operating on (aka, aggregation) or if it is transforming the column into another column of equal size. When it figures it out, it runs the remainder of the operation as if it were an aggregation or transform procedure.

df.apply('mean')

A    3.666667
B    4.500000
C    4.500000
D    4.666667
dtype: float64

df.apply(lambda x: (x - x.mean()) / x.std())

          A         B         C         D
0 -1.014353 -1.153164  0.652051 -0.740436
1  1.198781  1.153164 -1.173691  0.592349
2  1.198781  0.691898 -1.173691 -1.184698
3 -1.014353 -0.230633  1.173691  1.036611
4 -0.184428 -1.153164 -0.130410 -0.740436
5 -0.184428  0.691898  0.652051  1.036611

189

answered Oct 23 '22 15:10

piRSquared

Related questions
                            
                                Extended regression lines with seaborn regplot
                            
                                ValueError: multiclass format is not supported , xgboost
                            
                                Splitting column value into 2 new columns - Python Pandas
                            
                                Python NLP Intent Identification
                            
                                Add new HTML tag after current tag
                            
                                Does conda update packages from pypi installed using pip install?
                            
                                Function object called via class attribute fails
                            
                                Can I access class variables using self?
                            
                                Pytesseract foreign language extraction using python
                            
                                Problems to serialize property (getter and setter) from a model using Django Rest Framework
                            
                                Conditional mean over a Pandas DataFrame
                            
                                Does python static method consume less memory than instance method
                            
                                Python- positional argument follows keyword argument
                            
                                Pandas Insert data into MySQL
                            
                                Adding rows manually to StreamingHttpResponse (Django)
                            
                                Using selenium: How to keep logged in after closing Driver in Python
                            
                                Selecting all column names where value is greater than another column in pandas
                            
                                How to parse hierarchy based on indents with python
                            
                                How to count continuous numbers in numpy
                            
                                Building a connection URL for mssql+pyodbc with sqlalchemy.engine.url.URL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between transform vs applymap for pandas DataFrame

Tags:

python

pandas

dataframe

darcyy

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us