Can you tell me when to use these vectorization methods with basic examples? I see that <code>map</code> is a <code>Series</code> method whereas the rest are <code>DataFrame</code> methods. I got confused about <code>apply</code> and <code>applymap</code> methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book): <blockquote> Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this: </blockquote> <pre class="prettyprint"><code>In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon']) In [117]: frame Out[117]: b d e Utah -0.029638 1.081563 1.280300 Ohio 0.647747 0.831136 -1.549481 Texas 0.513416 -0.884417 0.195343 Oregon -0.485454 -0.477388 -0.309548 In [118]: f = lambda x: x.max() - x.min() In [119]: frame.apply(f) Out[119]: b 1.133201 d 1.965980 e 2.829781 dtype: float64 </code></pre> <blockquote> Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary. Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap: </blockquote> <pre class="prettyprint"><code>In [120]: format = lambda x: '%.2f' % x In [121]: frame.applymap(format) Out[121]: b d e Utah -0.03 1.08 1.28 Ohio 0.65 0.83 -1.55 Texas 0.51 -0.88 0.20 Oregon -0.49 -0.48 -0.31 </code></pre> <blockquote> The reason for the name applymap is that Series has a map method for applying an element-wise function: </blockquote> <pre class="prettyprint"><code>In [122]: frame['e'].map(format) Out[122]: Utah 1.28 Ohio -1.55 Texas 0.20 Oregon -0.31 Name: e, dtype: object </code></pre> Summing up, <code>apply</code> works on a row / column basis of a DataFrame, <code>applymap</code> works element-wise on a DataFrame, and <code>map</code> works element-wise on a Series.

<h3>Comparing <code>map</code>, <code>applymap</code> and <code>apply</code>: Context Matters</h3> First major difference: DEFINITION <ul> <li> <code>map</code> is defined on Series ONLY</li> <li> <code>applymap</code> is defined on DataFrames ONLY</li> <li> <code>apply</code> is defined on BOTH</li> </ul> Second major difference: INPUT ARGUMENT <ul> <li> <code>map</code> accepts <code>dict</code>s, <code>Series</code>, or callable</li> <li> <code>applymap</code> and <code>apply</code> accept callables only</li> </ul> Third major difference: BEHAVIOR <ul> <li> <code>map</code> is elementwise for Series</li> <li> <code>applymap</code> is elementwise for DataFrames</li> <li> <code>apply</code> also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.</li> </ul> Fourth major difference (the most important one): USE CASE <ul> <li> <code>map</code> is meant for mapping values from one domain to another, so is optimised for performance (e.g., <code>df['A'].map({1:'a', 2:'b', 3:'c'})</code>)</li> <li> <code>applymap</code> is good for elementwise transformations across multiple rows/columns (e.g., <code>df[['A', 'B', 'C']].applymap(str.strip)</code>)</li> <li> <code>apply</code> is for applying any function that cannot be vectorised (e.g., <code>df['sentences'].apply(nltk.sent_tokenize)</code>).</li> </ul> Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using <code>apply</code> (note that there aren't many, but there are a few— apply is generally slow). <hr> <h3>Summarising</h3> <img src="https://i.stack.imgur.com/IZys3.png" alt="enter image description here"> <blockquote> Footnotes <ol> <li> <code>map</code> when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output. </li> <li> <code>applymap</code> in more recent versions has been optimised for some operations. You will find <code>applymap</code> slightly faster than <code>apply</code> in some cases. My suggestion is to test them both and use whatever works better. </li> <li> <code>map</code> is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance. </li> <li> <code>Series.apply</code> returns a scalar for aggregating operations, Series otherwise. Similarly for <code>DataFrame.apply</code>. Note that <code>apply</code> also has fastpaths when called with certain NumPy functions such as <code>mean</code>, <code>sum</code>, etc. </li> </ol> </blockquote>

Difference between map, applymap and apply methods in Pandas

Tags:

python

pandas

dataframe

vectorization

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

848

asked Nov 05 '13 20:11

marillion

2 Answers

Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])  In [117]: frame Out[117]:                 b         d         e Utah   -0.029638  1.081563  1.280300 Ohio    0.647747  0.831136 -1.549481 Texas   0.513416 -0.884417  0.195343 Oregon -0.485454 -0.477388 -0.309548  In [118]: f = lambda x: x.max() - x.min()  In [119]: frame.apply(f) Out[119]:  b    1.133201 d    1.965980 e    2.829781 dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

In [120]: format = lambda x: '%.2f' % x  In [121]: frame.applymap(format) Out[121]:              b      d      e Utah    -0.03   1.08   1.28 Ohio     0.65   0.83  -1.55 Texas    0.51  -0.88   0.20 Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

In [122]: frame['e'].map(format) Out[122]:  Utah       1.28 Ohio      -1.55 Texas      0.20 Oregon    -0.31 Name: e, dtype: object

Summing up, apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.

109

answered Oct 21 '22 09:10

jeremiahbuddha

Comparing `map`, `applymap` and `apply`: Context Matters

First major difference: DEFINITION

map is defined on Series ONLY
applymap is defined on DataFrames ONLY
apply is defined on BOTH

Second major difference: INPUT ARGUMENT

map accepts dicts, Series, or callable
applymap and apply accept callables only

Third major difference: BEHAVIOR

map is elementwise for Series
applymap is elementwise for DataFrames
apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize)).

Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply (note that there aren't many, but there are a few— apply is generally slow).

Summarising

enter image description here

Footnotes

map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.

applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.

answered Oct 21 '22 08:10

cs95

Related questions
                            
                                Making Python loggers output all messages to stdout in addition to log file
                            
                                Calling C/C++ from Python? [closed]
                            
                                What does on_delete do on Django models?
                            
                                Split string on whitespace in Python [duplicate]
                            
                                What is the maximum recursion depth in Python, and how to increase it?
                            
                                Rename a dictionary key
                            
                                How to access the ith column of a NumPy multidimensional array?
                            
                                How to prompt for user input and read command-line arguments [closed]
                            
                                How to read a large file - line by line?
                            
                                How to get all possible combinations of a list’s elements?
                            
                                Configure Flask dev server to be visible across the network
                            
                                What is the common header format of Python files?
                            
                                mysql_config not found when installing mysqldb python interface
                            
                                What is the Python equivalent for a case/switch statement? [duplicate]
                            
                                How to use StringIO in Python3?
                            
                                CSV file written with Python has blank lines between each row
                            
                                "is" operator behaves unexpectedly with integers
                            
                                Removing pip's cache?
                            
                                String comparison in Python: is vs. == [duplicate]
                            
                                Get HTML source of WebElement in Selenium WebDriver using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between map, applymap and apply methods in Pandas

Tags:

python

pandas

dataframe

vectorization

marillion

People also ask

2 Answers

jeremiahbuddha

Comparing `map`, `applymap` and `apply`: Context Matters

Summarising

cs95

Recent Activity

Donate For Us

Difference between map, applymap and apply methods in Pandas

Tags:

python

pandas

dataframe

vectorization

marillion

People also ask

2 Answers

jeremiahbuddha

Comparing map, applymap and apply: Context Matters

Summarising

cs95

Related questions

Recent Activity

Donate For Us

Comparing `map`, `applymap` and `apply`: Context Matters