Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace() method not working on Pandas DataFrame

I have looked up this issue and most questions are for more complex replacements. However in my case I have a very simple dataframe as a test dummy.

The aim is to replace a string anywhere in the dataframe with an nan, however this does not seem to work (i.e. does not replace; no errors whatsoever). I've tried replacing with another string and it does not work either. E.g.

d = {'color' : pd.Series(['white', 'blue', 'orange']),
   'second_color': pd.Series(['white', 'black', 'blue']),
   'value' : pd.Series([1., 2., 3.])}
df = pd.DataFrame(d)
df.replace('white', np.nan)

The output is still:

      color second_color  value
  0   white        white      1
  1    blue        black      2
  2  orange         blue      3

This problem is often addressed using inplace=True, but there are caveats to that. Please also see Understanding inplace=True in pandas.

like image 512
dter Avatar asked Jun 02 '16 13:06

dter


People also ask

How do I replace in pandas?

Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.


3 Answers

Given that this is the top Google result when searching for "Pandas replace is not working" I'd like to also mention that:

replace does full replacement searches, unless you turn on the regex switch. Use regex=True, and it should perform partial replacements as well.

This took me 30 minutes to find out, so hopefully I've saved the next person 30 minutes.

like image 124
user1761806 Avatar answered Oct 10 '22 16:10

user1761806


You need to assign back

df = df.replace('white', np.nan)

or pass param inplace=True:

In [50]:
d = {'color' : pd.Series(['white', 'blue', 'orange']),
   'second_color': pd.Series(['white', 'black', 'blue']),
   'value' : pd.Series([1., 2., 3.])}
df = pd.DataFrame(d)
df.replace('white', np.nan, inplace=True)
df

Out[50]:
    color second_color  value
0     NaN          NaN    1.0
1    blue        black    2.0
2  orange         blue    3.0

Most pandas ops return a copy and most have param inplace which is usually defaulted to False

like image 36
EdChum Avatar answered Oct 10 '22 18:10

EdChum


Neither one with inplace=True nor the other with regex=True don't work in my case. So I found a solution with using Series.str.replace instead. It can be useful if you need to replace a substring.

In [4]: df['color'] = df.color.str.replace('e', 'E!')
In [5]: df  
Out[5]: 
     color second_color  value
0   whitE!        white    1.0
1    bluE!        black    2.0
2  orangE!         blue    3.0

or even with a slicing.

In [10]: df.loc[df.color=='blue', 'color'] = df.color.str.replace('e', 'E!')
In [11]: df  
Out[11]: 
    color second_color  value
0   white        white    1.0
1   bluE!        black    2.0
2  orange         blue    3.0
like image 12
Daniil Mashkin Avatar answered Oct 10 '22 18:10

Daniil Mashkin