Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is replace row-wise and will overwrite the value within the dict twice?

Tags:

python

pandas

Assuming I have following data set

lst = ['u', 'v', 'w', 'x', 'y']
lst_rev = list(reversed(lst))
dct = dict(zip(lst, lst_rev))

df = pd.DataFrame({'A':['a', 'b', 'a', 'c', 'a'],
                   'B':lst},
                   dtype='category')

Now I want to replace the value of column B in df by dct

I know I can do

df.B.map(dct).fillna(df.B)

to get the expected out put , but when I test with replace (which is more straightforward base on my thinking ), I failed

The out put show as below

df.B.replace(dct)
Out[132]: 
0    u
1    v
2    w
3    v
4    u
Name: B, dtype: object

Which is different from the

df.B.map(dct).fillna(df.B)
Out[133]: 
0    y
1    x
2    w
3    v
4    u
Name: B, dtype: object

I can think that the reason why this happen, But why ?

0    u --> change to y then change to u
1    v --> change to x then change to v
2    w
3    v
4    u

Appreciate your help.

like image 611
BENY Avatar asked Sep 25 '18 21:09

BENY


People also ask

How to update the value of a row with respect to columns?

Python loc () method can also be used to update the value of a row with respect to columns by providing the labels of the columns and the index of the rows.

When given two dictionaries what should be the values updated?

Given two dictionaries, update the values from other dictionary if key is present in other dictionary. Explanation : “Geeks” and “Best” values updated to 10 and 17.

How to change the value of a row in a Dataframe?

1 Using Python at () method to update the value of a row. ... 2 Python loc () function to change the value of a row/column. ... 3 Python replace () method to update values in a dataframe. Using Python replace () method, we can update or change the value of any string within a data frame. 4 Using iloc () method to update the value of a row. ...

How many times can you replace a column in Excel?

Quite often, you may need to do more than one replacement in the same cell. Of course, you could do one replacement, output an intermediate result into an additional column, and then use the REPLACE function again.


2 Answers

It's because replace keeps applying the dictionary

df.B.replace({'u': 'v', 'v': 'w', 'w': 'x', 'x': 'y', 'y': 'Hello'})

0    Hello
1    Hello
2    Hello
3    Hello
4    Hello
Name: B, dtype: object

With the given dct 'u' -> 'y' then 'y' -> 'u'.

like image 61
piRSquared Avatar answered Oct 25 '22 08:10

piRSquared


This behavior is not intended, and was recognized as a bug.

This is the Github issue that first identified the behavior, and it was added as a milestone for pandas 0.24.0. I can confirm the replacement works as expected in the current version on Github.

Here is the PR containing the fix.

like image 24
user3483203 Avatar answered Oct 25 '22 08:10

user3483203