I stumbled upon a weird and inconsistent behavior for Pandas <code>replace</code> function when using it to swap two values of a column. When using it to swap integers in a column we have <pre class="prettyprint"><code>df = pd.DataFrame({'A': [0, 1]}) df.A.replace({0: 1, 1: 0}) </code></pre> This yields the result: <pre class="prettyprint"><code>df A 1 0 </code></pre> However, when using the same commands for string values <pre class="prettyprint"><code>df = pd.DataFrame({'B': ['a', 'b']}) df.B.replace({'a': 'b', 'b': 'a'}) </code></pre> We get <pre class="prettyprint"><code>df B 'a' 'a' </code></pre> Can anyone explain me this difference in behavior, or point me to a page in the docs that deals with inconsistencies when using integers and strings in pandas?

Yup, this is definitely a bug, so I've opened a new issue - GH20656. It looks like pandas applies the replacements successively. It makes first replacement, causing "a" to be replaced with "b", and then the second, causing both "b"s to be replaced by "a". In summary, what you see is equivalent to <pre class="prettyprint"><code>df.B.replace('a', 'b').replace('b', 'a') 0 a 1 a Name: B, dtype: object </code></pre> Which is definitely not what should be happening. <hr> There is a workaround using <code>str.replace</code> with a <code>lambda</code> callback. <pre class="prettyprint"><code>m = {'a': 'b', 'b': 'a'} df.B.str.replace('|'.join(m.keys()), lambda x: m[x.group()]) 0 b 1 a Name: B, dtype: object </code></pre>

Pandas weird behavior using .replace() to swap values

Tags:

python

string

replace

pandas

dataframe

I stumbled upon a weird and inconsistent behavior for Pandas replace function when using it to swap two values of a column. When using it to swap integers in a column we have

df = pd.DataFrame({'A': [0, 1]})
df.A.replace({0: 1, 1: 0})

This yields the result:

df
A
1
0

However, when using the same commands for string values

df = pd.DataFrame({'B': ['a', 'b']})
df.B.replace({'a': 'b', 'b': 'a'})

We get

df
B
'a'
'a'

Can anyone explain me this difference in behavior, or point me to a page in the docs that deals with inconsistencies when using integers and strings in pandas?

556

asked Apr 11 '18 12:04

Ricardo

1 Answers

Yup, this is definitely a bug, so I've opened a new issue - GH20656.

It looks like pandas applies the replacements successively. It makes first replacement, causing "a" to be replaced with "b", and then the second, causing both "b"s to be replaced by "a".

In summary, what you see is equivalent to

df.B.replace('a', 'b').replace('b', 'a')

0    a
1    a
Name: B, dtype: object

Which is definitely not what should be happening.

There is a workaround using str.replace with a lambda callback.

m = {'a': 'b', 'b': 'a'}
df.B.str.replace('|'.join(m.keys()), lambda x: m[x.group()])

0    b
1    a
Name: B, dtype: object

answered Nov 15 '22 05:11

cs95

Related questions
                            
                                Unable to load and use multiple keras models
                            
                                Center datetimes of resampled time series
                            
                                Kronecker delta in Numpy
                            
                                How to populate Pandas dataframe as function of index and columns
                            
                                Set default compiler when using Cython and setuptools to compile multiple extensions
                            
                                Python ImageFont and ImageDraw check font for character support
                            
                                pyLDAvis: Validation error on trying to visualize topics
                            
                                Memory leak when running python script from C++
                            
                                PyGObject on Windows
                            
                                Expressive way compose generators in Python
                            
                                Need to make a cartoon comic version of a picture with Python and OpenCV
                            
                                How to write shebang when using features of minor versions
                            
                                Add hand-crafted features to Keras sequential model
                            
                                MemoryError while creating cartesian product in Numpy
                            
                                How to annotate variadic parameters in Python using typing annotations?
                            
                                Maximum limit on number of threads in python
                            
                                How can you re-use a variable scope in tensorflow without a new scope being created by default?
                            
                                How to plot FFT of signal with correct frequencies on x-axis?
                            
                                Django "NULLS LAST" for creating Indexes
                            
                                Loss decreases but weights don't appear to change during tensorflow gradient descent

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With