I have a dataframe as below : <pre class="prettyprint"><code>data1 = {"first":["alice", "bob", "carol"], "last_huge":["foo", "bar", "baz"]} df = pd.DataFrame(data1) </code></pre> For example , I want to replace all character 'o' to 'a': Then I do <pre class="prettyprint"><code>df.replace({"o":"a"},regex=True) Out[668]: first last 0 alice faa 1 bab bar 2 caral baz </code></pre> It give back what I need . However, when I want to replace 'o' to <code>np.nan</code> , It will change entire string to <code>np.nan</code>. Is there any explanation from pandas' document? I can find some information through the source code . More Information:(It will change whole string to <code>np.nan</code>) <pre class="prettyprint"><code>df.replace({"o":np.nan},regex=True) Out[669]: first last 0 alice NaN 1 NaN bar 2 NaN baz </code></pre>

NaN is consistently used as a placeholder for missing, when replacing part of a string with "missing" it can only mean the entire entry is compromised. I've heard this called NaN pollution (or similar, will see if I can find some references), in that if NaN touches the data is compromised. That said, that's not always the case: <pre class="prettyprint"><code>In [11]: s = pd.Series([1, 2, np.nan, 4]) In [12]: s.sum() Out[12]: 7.0 In [13]: s.sum(skipna=False) Out[13]: nan </code></pre> In some languages you'll see skipna=False as the default behaviour, some vehemently argue that NaN should always pollute all data. Pandas takes a somewhat more pragmatic approach... The real question is what do you expect it to do in the case of NaN?

replace value by using regex to np.nan

Tags:

I have a dataframe as below :

data1 = {"first":["alice", "bob", "carol"],
         "last_huge":["foo", "bar", "baz"]}
df = pd.DataFrame(data1)

For example , I want to replace all character 'o' to 'a':

Then I do

df.replace({"o":"a"},regex=True)
Out[668]: 
   first last
0  alice  faa
1    bab  bar
2  caral  baz

It give back what I need .

However, when I want to replace 'o' to np.nan , It will change entire string to np.nan. Is there any explanation from pandas' document? I can find some information through the source code .

More Information:(It will change whole string to np.nan)

df.replace({"o":np.nan},regex=True)
Out[669]: 
   first last
0  alice  NaN
1    NaN  bar
2    NaN  baz

832

asked Oct 26 '17 01:10

BENY

1 Answers

NaN is consistently used as a placeholder for missing, when replacing part of a string with "missing" it can only mean the entire entry is compromised. I've heard this called NaN pollution (or similar, will see if I can find some references), in that if NaN touches the data is compromised.

That said, that's not always the case:

In [11]: s = pd.Series([1, 2, np.nan, 4])

In [12]: s.sum()
Out[12]: 7.0

In [13]: s.sum(skipna=False)
Out[13]: nan

In some languages you'll see skipna=False as the default behaviour, some vehemently argue that NaN should always pollute all data. Pandas takes a somewhat more pragmatic approach...

The real question is what do you expect it to do in the case of NaN?

answered Sep 30 '22 03:09

Andy Hayden

Related questions
                            
                                Angular 4: How to access element inside ng-template
                            
                                How to remove required field if the field is hidden state - Angular 2
                            
                                Does Encoding.UTF8.GetBytes() create a BOM?
                            
                                Syncing relationships using CoreData and CloudKit
                            
                                Deploying React w/ WordPress as Backend using WP Rest API
                            
                                Is there a way that I can use Spans inside of a label and also have it justified?
                            
                                What's the difference between net.layers.blobs and net.params in Caffe
                            
                                Update: :Extension: Could not open https://update.joomla.org/core/sts/extension_sts.xml
                            
                                Spring Kafka Consumer/Listener Group
                            
                                tensorflow-GPU OOM issue after several epochs
                            
                                Arnaud Legoux Moving Average and numpy
                            
                                How to pass the "page" element to a function with puppeteer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With