Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Pandas Series.apply() and Series.map()? [duplicate]

Series.map():

Map values of Series using input correspondence (which can be a dict, Series, or function)

Series.apply()

Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values

apply() seems like it does mostly everything map() does, vectorizing scalar functions while applying vectorized operations as they are. Meanwhile map() allows for some amount of control over null value handling. Apart from historical analogy to Python's apply() and map() functions, is there a reason to prefer one over the other in general use? Why wouldn't these functions just be combined?

like image 216
shadowtalker Avatar asked Jul 08 '16 23:07

shadowtalker


People also ask

What is the difference between map and replace pandas?

They differ in the following: replace accepts str, regex, list, dict, Series, int, float, or None. map accepts a dict or a Series. They differ in handling null values.

What is faster map or apply pandas?

We could also choose to map the function over each element within the Pandas Series. This is actually somewhat faster than Series Apply, but still relatively slow.

What is the difference between Panda series and Panda DataFrame?

Series can only contain single list with index, whereas dataframe can be made of more than one series or we can say that a dataframe is a collection of series that can be used to analyse the data.

What is the difference between pandas series and list?

Technically, a Series is not a list iternally but a numpy array - which is both faster and smaller (memory wise) than a python list. So for many elements, a Series has better performance. A Series also offers method to manipulate and describe data which a list has not.


1 Answers

The difference is subtle:

pandas.Series.map will substitute the values of the Series by what you pass into map.

pandas.Series.apply will apply a function (potentially with arguments) to the values of the Series.

The difference is what you can pass to the methods

  • both map and apply can receive a function :
s = pd.Series([1, 2, 3, 4])

def square(x):
     return x**2

s.map(square) 

0    1
1    2
2    3
3    4
dtype: int64

s.apply(square) 

0    1
1    2
2    3
3    4
dtype: int64
  • However, the function you pass into map cannot have more than one parameter (it will output a ValueError) :
def power(x, p):
    return x**p

s.apply(power, p=3)

0     1
1     8
2    27
3    64
dtype: int64


s.map(power,3)
---------------------------------------------------------------------------
ValueError  

  • map can receive a dictionary (or even a pd.Series in which case it will use the index as key ) while apply cannot (it will output a TypeError)
dic = {1: 5, 2: 4}

s.map(dic)

0    5.0
1    4.0
2    NaN
3    NaN
dtype: float64

s.apply(dic)
---------------------------------------------------------------------------
TypeError  


s.map(s)

0    2.0
1    3.0
2    4.0
3    NaN
dtype: float64


s.apply(s)

---------------------------------------------------------------------------
TypeError  
like image 73
Luis Blanche Avatar answered Sep 28 '22 01:09

Luis Blanche