Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace value by using regex to np.nan

Tags:

I have a dataframe as below :

data1 = {"first":["alice", "bob", "carol"],
         "last_huge":["foo", "bar", "baz"]}
df = pd.DataFrame(data1)

For example , I want to replace all character 'o' to 'a':

Then I do

df.replace({"o":"a"},regex=True)
Out[668]: 
   first last
0  alice  faa
1    bab  bar
2  caral  baz

It give back what I need .

However, when I want to replace 'o' to np.nan , It will change entire string to np.nan. Is there any explanation from pandas' document? I can find some information through the source code .

More Information:(It will change whole string to np.nan)

df.replace({"o":np.nan},regex=True)
Out[669]: 
   first last
0  alice  NaN
1    NaN  bar
2    NaN  baz
like image 832
BENY Avatar asked Oct 26 '17 01:10

BENY


People also ask

How do I change a value from NP to NaN?

Use numpy. nan to replace a number in a NumPy array with NaN astype("float") to convert each value in numpy. array to a float. Use the syntax array[i] = numpy. nan to replace the value at position i in the previous result array to NaN .

Can you use regex in replace Python?

To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.


1 Answers

NaN is consistently used as a placeholder for missing, when replacing part of a string with "missing" it can only mean the entire entry is compromised. I've heard this called NaN pollution (or similar, will see if I can find some references), in that if NaN touches the data is compromised.

That said, that's not always the case:

In [11]: s = pd.Series([1, 2, np.nan, 4])

In [12]: s.sum()
Out[12]: 7.0

In [13]: s.sum(skipna=False)
Out[13]: nan

In some languages you'll see skipna=False as the default behaviour, some vehemently argue that NaN should always pollute all data. Pandas takes a somewhat more pragmatic approach...

The real question is what do you expect it to do in the case of NaN?

like image 66
Andy Hayden Avatar answered Sep 30 '22 03:09

Andy Hayden