I have two numpy masked arrays:
>>> x
masked_array(data = [1 2 -- 4],
mask = [False False True False],
fill_value = 999999)
>>> y
masked_array(data = [4 -- 0 4],
mask = [False True False False],
fill_value = 999999)
If I try to divide x
by y
, the division operation is not actually performed when one of the operands is masked, so I don't get a divide-by-zero error.
>>> x/y
masked_array(data = [0.25 -- -- 1.0],
mask = [False True True False],
fill_value = 1e+20)
This even works if I define my own division function, div
:
>>> def div(a,b):
return a/b
>>> div(x, y)
masked_array(data = [0.25 -- -- 1.0],
mask = [False True True False],
fill_value = 1e+20)
However, if I wrap my function with vectorize
, the function is called on masked values and I get an error:
>>> np.vectorize(div)(x, y)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1811, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1880, in _vectorize_call
outputs = ufunc(*inputs)
File "<input>", line 2, in div
ZeroDivisionError: division by zero
Is there a way I can call a function with array arguments, and have the function only be executed when all of the arguments are unmasked?
A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.
To mask an array where invalid values occur (NaNs or infs), use the numpy. ma. masked_invalid() method in Python Numpy. This function is a shortcut to masked_where, with condition = ~(np.
To create a boolean mask from an array, use the ma. make_mask() method in Python Numpy. The function can accept any sequence that is convertible to integers, or nomask. Does not require that contents must be 0s and 1s, values of 0 are interpreted as False, everything else as True.
Calling the function directly worked because, when you call div(x,y)
, div
's arguments a
and b
become the MaskedArrays x
and y
, and the resulting code for a/b
is x.__div__(y)
(or __truediv__
).
Now, since x
is a MaskedArray, it has the intelligence to perform the division on another MaskedArray, following its rules.
However, when you vectorize it, your div
function is not going to see any MaskedArrays, just scalars, a couple of int
s in this case. So, when it tries a/b
in the third items, it will be 'something' by zero, and you get the error.
MaskedArray's implementation seems to be based on re-implementing much of Numpy specifically for MaskedArrays. See, for example, that you have both numpy.log
and numpy.ma.log
. Compare running both of them on a MaskedArray that contains negative values. Both actually return a proper MaskedArray, but the plain numpy version also outputs some complains about dividing by zero:
In [116]: x = masked_array(data = [-1, 2, 0, 4],
...: mask = [False, False, True, False],
...: fill_value = 999999)
In [117]: numpy.log(x)
/usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
#!/usr/bin/python3
/usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log
#!/usr/bin/python3
Out[117]:
masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906],
mask = [ True False True False],
fill_value = 999999)
In [118]: numpy.ma.log(x)
Out[118]:
masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906],
mask = [ True False True False],
fill_value = 999999)
If you run the numpy.log version on a plain list, it will return nan
and inf
for invalid values, not throw an error like the ZeroDivisionError
you're getting.
In [138]: a = [1,-1,0]
In [139]: numpy.log(a)
/usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
#!/usr/bin/python3
/usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log
#!/usr/bin/python3
Out[139]: array([ 0., nan, -inf])
With that, I see two alternatives: first, for the simpler case you listed, you could replace the bad values by a no-op: 1 in div
's case (note that the data is slightly different from yours, as there is a zero you didn't mark as masked):
x = masked_array(data = [1, 2, 0, 4],
mask = [False, False, True, False],
fill_value = 999999)
y = masked_array(data = [4, 0, 0, 4],
mask = [False, True, True, False],
fill_value = 999999)
In [153]: numpy.vectorize(div)(x,y.filled(1))
Out[153]:
masked_array(data = [0.25 2.0 -- 1.0],
mask = [False False True False],
fill_value = 999999)
The problem with that approach is that the filled values are listed as non-masked on the result, which is probably not what you want.
Now, div
was probably just an example, and you probably want more complex behavior for which there is not a 'no-op' argument. In this case, you can do as Numpy did for log
, and avoid throwing an exception, instead returning a specific value. In this case, numpy.ma.masked
. div
's implementation becomes this:
In [154]: def div(a,b):
...: try:
...: return a/b
...: except Exception as e:
...: warnings.warn (str(e))
...: return numpy.ma.masked
...:
...:
In [155]: numpy.vectorize(div)(x,y)
/usr/bin/ipython:5: UserWarning: division by zero
start_ipython()
/usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813: UserWarning: Warning: converting a masked element to nan.
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
Out[155]:
masked_array(data = [0.25 -- -- 1.0],
mask = [False True True False],
fill_value = 999999)
But perhaps you already have the function and do not want to change it, or it is third-party. In that case, you could use a higher-order function:
In [164]: >>> def div(a,b):
...: return a/b
...:
In [165]: def masked_instead_of_error (f):
...: def wrapper (*args, **kwargs):
...: try:
...: return f(*args, **kwargs)
...: except:
...: return numpy.ma.masked
...: return wrapper
...:
In [166]: numpy.vectorize(masked_instead_of_error(div))(x,y)
/usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813: UserWarning: Warning: converting a masked element to nan.
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
Out[166]:
masked_array(data = [0.25 -- -- 1.0],
mask = [False True True False],
fill_value = 999999)
On the implementations above, using warnings might or might not be a good idea. You may also want to restrict the types of exceptions you'll be catching for returning numpy.ma.masked
.
Note also that masked_instead_of_error
is ready to be used as a decorator for your functions, so you do not need to use it every time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With