Using python 2.7, scipy 1.0.0-3
Apparently I have a misunderstanding of how the numpy where function is supposed to operate or there is a known bug in its operation. I'm hoping someone can tell me which and explain a work-around to suppress the annoying warning that I am trying to avoid. I'm getting the same behavior when I use the pandas Series where().
To make it simple, I'll use a numpy array as my example. Say I want to apply np.log() on the array and only so for the condition a value is a valid input, i.e., myArray>0.0. For values where this function should not be applied, I want to set the output flag of -999.9:
myArray = np.array([1.0, 0.75, 0.5, 0.25, 0.0])
np.where(myArray>0.0, np.log(myArray), -999.9)
I expected numpy.where() to not complain about the 0.0 value in the array since the condition is False there, yet it does and it appears to actually execute for that False condition:
-c:2: RuntimeWarning: divide by zero encountered in log
array([ 0.00000000e+00, -2.87682072e-01, -6.93147181e-01,
-1.38629436e+00, -9.99900000e+02])
The numpy documentation states:
If x and y are given and input arrays are 1-D, where is equivalent to: [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
I beg to differ with this statement since
[np.log(val) if val>0.0 else -999.9 for val in myArray]
provides no warning at all:
[0.0, -0.2876820724517809, -0.69314718055994529, -1.3862943611198906, -999.9]
So, is this a known bug? I don't want to suppress the warning for my entire code.
You can have the log evaluated at the relevant places only using its optional where parameter
np.where(myArray>0.0, np.log(myArray, where=myArray>0.0), -999.9)
or more efficiently
mask = myArray > 0.0
np.where(mask, np.log(myArray, where=mask), -999)
or if you find the "double where" ugly
np.log(myArray, where=myArray>0.0, out=np.full(myArray.shape, -999.9))
Any one of those three should suppress the warning.
This behavior of where should be understandable given a basic understanding of Python. This is a Python expression that uses a couple of numpy functions.
What happens in this expression?
np.where(myArray>0.0, np.log(myArray), -999.9)
The interpreter first evaluates all the arguments of the function, and then passes the results to the where. Effectively then:
cond = myArray>0.0
A = np.log(myArray)
B = -999.9
np.where(cond, A, B)
The warning is produced in the 2nd line, not in the 4th.
The 4th line is equivalent to:
[xv if c else yv for (c,xv,yv) in zip(cond, A, B)]
or
[A[i] if c else B for i,c in enumerate(cond)]
np.where is most often used with one argument, where it is a synonym for np.nonzero. We don't see this three-argument form that often on SO. It isn't that useful, in part because it doesn't save on calculations.
Masked assignment is more often, especially if there are more than 2 alternatives.
In [123]: mask = myArray>0
In [124]: out = np.full(myArray.shape, np.nan)
In [125]: out[mask] = np.log(myArray[mask])
In [126]: out
Out[126]: array([ 0. , -0.28768207, -0.69314718, -1.38629436, nan])
Paul Panzer showed how to do the same with the where parameter of log. That feature isn't being used as much as it could be.
In [127]: np.log(myArray, where=mask, out=out)
Out[127]: array([ 0. , -0.28768207, -0.69314718, -1.38629436, nan])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With