Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a difference between `Series.replace()` and `Series.map()` in pandas? [duplicate]

Both pandas.Series.map and pandas.Series.replace seem to give the same result. Is there a reason for using one over the other? For example:

import pandas as pd
df = pd.Series(['Yes', 'No'])
df

0    Yes
1     No
dtype: object
df.replace(to_replace=['Yes', 'No'], value=[True, False])

0     True
1    False
dtype: bool
df.map({'Yes':True, 'No':False})

0     True
1    False
dtype: bool
df.replace(to_replace=['Yes', 'No'], value=[True, False]).equals(df.map({'Yes':True, 'No':False}))

True
like image 574
Data2Dollars Avatar asked Dec 12 '25 06:12

Data2Dollars


1 Answers

Both of these methods are used for substituting values.

From Series.replace docs:

Replace values given in to_replace with value.

From Series.map docs:

Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

They differ in the following:

  1. replace accepts str, regex, list, dict, Series, int, float, or None. map accepts a dict or a Series.
  2. They differ in handling null values.
  3. replace uses re.sub under the hood.The rules for substitution for re.sub are the same.

Take below example:

In [124]: s = pd.Series([0, 1, 2, 3, 4])    
In [125]: s
Out[125]: 
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [126]: s.replace({0: 5})
Out[126]: 
0    5
1    1
2    2
3    3
4    4
dtype: int64

In [129]: s.map({0: 'kitten', 1: 'puppy'}) 
Out[129]: 
0    kitten
1     puppy
2       NaN
3       NaN
4       NaN
dtype: object

As you can see for s.map method, values that are not found in the dict are converted to NaN, unless the dict has a default value (e.g. defaultdict)

For s.replace, it just replaces the value to be replaced keeping the rest as it is.

like image 98
Mayank Porwal Avatar answered Dec 13 '25 18:12

Mayank Porwal