<p>I want to do an element-wise OR operation on two pandas Series of boolean values. <code>np.nan</code>s are also included.</p> <p>I have tried three approaches and realized that the expression "<code>np.nan</code> or <code>False</code>" can be evaluted to <code>True</code>, <code>False</code>, and <code>np.nan</code> depending on the approach.</p> <p>These are my example Series: </p> <pre class="prettyprint"><code>series_1 = pd.Series([True, False, np.nan]) series_2 = pd.Series([False, False, False]) </code></pre> <h3>Approach #1</h3> <p>Using the <code>|</code> operator of pandas: </p> <pre class="prettyprint"><code>In [5]: series_1 | series_2 Out[5]: 0 True 1 False 2 False dtype: bool </code></pre> <h3>Approach #2</h3> <p>Using the <code>logical_or</code> function from numpy:</p> <pre class="prettyprint"><code>In [6]: np.logical_or(series_1, series_2) Out[6]: 0 True 1 False 2 NaN dtype: object </code></pre> <h3>Approach #3</h3> <p>I define a vectorized version of <code>logical_or</code> which is supposed to be evaluated row-by-row over the arrays:</p> <pre class="prettyprint"><code>@np.vectorize def vectorized_or(a, b): return np.logical_or(a, b) </code></pre> <p>I use <code>vectorized_or</code> on the two series and convert its output (which is a numpy array) into a pandas Series:</p> <pre class="prettyprint"><code>In [8]: pd.Series(vectorized_or(series_1, series_2)) Out[8]: 0 True 1 False 2 True dtype: bool </code></pre> <h3>Question</h3> <p>I am wondering about the reasons for these results.<br>This answer explains <code>np.logical_or</code> and says <code>np.logical_or(np.nan, False)</code> is be <code>True</code> but why does this only works when vectorized and not in Approach #2? And how can the results of Approach #1 be explained? </p>

<p>first difference : <code>|</code> is <code>np.bitwise_or</code>. it explains the difference between #1 and #2.</p> <p>Second difference : since serie_1.dtype if <code>object</code> (non homogeneous data), operations are done row by row in the two first cases.</p> <p>When using vectorize ( #3):</p> <blockquote> <p>The data type of the output of <code>vectorized</code> is determined by calling the function with the first element of the input. This can be avoided by specifying the <code>otypes</code> argument.</p> </blockquote> <p>For vectorized operations, you quit the object mode. data are first converted according to first element (bool here, <code>bool(nan)</code> is <code>True</code>) and the operations are done after. </p>

Comparing logical values to NaN in pandas/numpy

Tags:

python

pandas

numpy

I want to do an element-wise OR operation on two pandas Series of boolean values. np.nans are also included.

I have tried three approaches and realized that the expression "np.nan or False" can be evaluted to True, False, and np.nan depending on the approach.

These are my example Series:

series_1 = pd.Series([True, False, np.nan])
series_2 = pd.Series([False, False, False])

Approach #1

Using the | operator of pandas:

In [5]: series_1 | series_2
Out[5]: 
0     True
1    False
2    False
dtype: bool

Approach #2

Using the logical_or function from numpy:

In [6]: np.logical_or(series_1, series_2)
Out[6]: 
0     True
1    False
2      NaN
dtype: object

Approach #3

I define a vectorized version of logical_or which is supposed to be evaluated row-by-row over the arrays:

@np.vectorize
def vectorized_or(a, b):
   return np.logical_or(a, b)

I use vectorized_or on the two series and convert its output (which is a numpy array) into a pandas Series:

In [8]:  pd.Series(vectorized_or(series_1, series_2))
Out[8]: 
0     True
1    False
2     True
dtype: bool

Question

I am wondering about the reasons for these results.
This answer explains np.logical_or and says np.logical_or(np.nan, False) is be True but why does this only works when vectorized and not in Approach #2? And how can the results of Approach #1 be explained?

848

asked May 10 '16 07:05

afulop

1 Answers

first difference : | is np.bitwise_or. it explains the difference between #1 and #2.

Second difference : since serie_1.dtype if object (non homogeneous data), operations are done row by row in the two first cases.

When using vectorize ( #3):

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

For vectorized operations, you quit the object mode. data are first converted according to first element (bool here, bool(nan) is True) and the operations are done after.

answered Oct 19 '22 19:10

B. M.

Related questions
                            
                                Generating natural schedule for a sports league
                            
                                Getting type/size of `time_t` using ctypes
                            
                                Django Admin using RESTful API v.s. Database
                            
                                Pruning dendrogram in scipy (hierarchical clustering)
                            
                                Is there a tool that removes functions that are not used in Python?
                            
                                Display number of assertions in python unit tests
                            
                                Python: functions to change values 'in place'?
                            
                                In Celery, how can I keep long-delayed tasks from blocking newer ones?
                            
                                Python os.fork OSError : [Errno 12] Cannot allocate memory (but memory not the issue)
                            
                                Does django-rest-swagger not work well with modelserializers?
                            
                                How to assign a value to a django form field in the template?
                            
                                django countries encoding is not giving correct name
                            
                                Flask-Login: Does not work on local machine but fine on hosting
                            
                                Handling HTTP authentication when accesing remote urls via pandas
                            
                                IPython autoreload changes in subdirectory
                            
                                Tkinter's overrideredirect prevents certain events in Mac and Linux
                            
                                How to embed Python3 with the standard library
                            
                                How can I filter a Pandas GroupBy object and obtain a GroupBy object back?
                            
                                "OverflowError: Allocated too many blocks":
                            
                                Authenticate in Django without a database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With