Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas weird behavior using .replace() to swap values

I stumbled upon a weird and inconsistent behavior for Pandas replace function when using it to swap two values of a column. When using it to swap integers in a column we have

df = pd.DataFrame({'A': [0, 1]})
df.A.replace({0: 1, 1: 0})

This yields the result:

df
A
1
0

However, when using the same commands for string values

df = pd.DataFrame({'B': ['a', 'b']})
df.B.replace({'a': 'b', 'b': 'a'})

We get

df
B
'a'
'a'

Can anyone explain me this difference in behavior, or point me to a page in the docs that deals with inconsistencies when using integers and strings in pandas?

like image 556
Ricardo Avatar asked Apr 11 '18 12:04

Ricardo


People also ask

How do you conditionally replace values in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you replace a specific value in a pandas DataFrame?

DataFrame. replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.


1 Answers

Yup, this is definitely a bug, so I've opened a new issue - GH20656.

It looks like pandas applies the replacements successively. It makes first replacement, causing "a" to be replaced with "b", and then the second, causing both "b"s to be replaced by "a".

In summary, what you see is equivalent to

df.B.replace('a', 'b').replace('b', 'a')

0    a
1    a
Name: B, dtype: object

Which is definitely not what should be happening.


There is a workaround using str.replace with a lambda callback.

m = {'a': 'b', 'b': 'a'}
df.B.str.replace('|'.join(m.keys()), lambda x: m[x.group()])

0    b
1    a
Name: B, dtype: object
like image 93
cs95 Avatar answered Nov 15 '22 05:11

cs95