<p>I have two numpy masked arrays:</p> <pre class="prettyprint"><code>>>> x masked_array(data = [1 2 -- 4], mask = [False False True False], fill_value = 999999) >>> y masked_array(data = [4 -- 0 4], mask = [False True False False], fill_value = 999999) </code></pre> <p>If I try to divide <code>x</code> by <code>y</code>, the division operation is not actually performed when one of the operands is masked, so I don't get a divide-by-zero error.</p> <pre class="prettyprint"><code>>>> x/y masked_array(data = [0.25 -- -- 1.0], mask = [False True True False], fill_value = 1e+20) </code></pre> <p>This even works if I define my own division function, <code>div</code>:</p> <pre class="prettyprint"><code>>>> def div(a,b): return a/b >>> div(x, y) masked_array(data = [0.25 -- -- 1.0], mask = [False True True False], fill_value = 1e+20) </code></pre> <p>However, if I wrap my function with <code>vectorize</code>, the function is called on masked values and I get an error:</p> <pre class="prettyprint"><code>>>> np.vectorize(div)(x, y) Traceback (most recent call last): File "<input>", line 1, in <module> File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1811, in __call__ return self._vectorize_call(func=func, args=vargs) File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1880, in _vectorize_call outputs = ufunc(*inputs) File "<input>", line 2, in div ZeroDivisionError: division by zero </code></pre> <p>Is there a way I can call a function with array arguments, and have the function only be executed when all of the arguments are unmasked?</p>

<h3>The problem</h3> <p>Calling the function directly worked because, when you call <code>div(x,y)</code>, <code>div</code>'s arguments <code>a</code> and <code>b</code> become the MaskedArrays <code>x</code> and <code>y</code>, and the resulting code for <code>a/b</code> is <code>x.__div__(y)</code> (or <code>__truediv__</code>).</p> <p>Now, since <code>x</code> is a MaskedArray, it has the intelligence to perform the division on another MaskedArray, following its rules.</p> <p>However, when you vectorize it, your <code>div</code> function is not going to see any MaskedArrays, just scalars, a couple of <code>int</code>s in this case. So, when it tries <code>a/b</code> in the third items, it will be 'something' by zero, and you get the error.</p> <p>MaskedArray's implementation seems to be based on re-implementing much of Numpy specifically for MaskedArrays. See, for example, that you have both <code>numpy.log</code> and <code>numpy.ma.log</code>. Compare running both of them on a MaskedArray that contains negative values. Both actually return a proper MaskedArray, but the plain numpy version also outputs some complains about dividing by zero:</p> <pre class="prettyprint"><code>In [116]: x = masked_array(data = [-1, 2, 0, 4], ...: mask = [False, False, True, False], ...: fill_value = 999999) In [117]: numpy.log(x) /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log #!/usr/bin/python3 /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log #!/usr/bin/python3 Out[117]: masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906], mask = [ True False True False], fill_value = 999999) In [118]: numpy.ma.log(x) Out[118]: masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906], mask = [ True False True False], fill_value = 999999) </code></pre> <p>If you run the numpy.log version on a plain list, it will return <code>nan</code> and <code>inf</code> for invalid values, not throw an error like the <code>ZeroDivisionError</code> you're getting.</p> <pre class="prettyprint"><code>In [138]: a = [1,-1,0] In [139]: numpy.log(a) /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log #!/usr/bin/python3 /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log #!/usr/bin/python3 Out[139]: array([ 0., nan, -inf]) </code></pre> <h3>Simpler solution</h3> <p>With that, I see two alternatives: first, for the simpler case you listed, you could replace the bad values by a no-op: 1 in <code>div</code>'s case (note that the data is slightly different from yours, as there is a zero you didn't mark as masked):</p> <pre class="prettyprint"><code>x = masked_array(data = [1, 2, 0, 4], mask = [False, False, True, False], fill_value = 999999) y = masked_array(data = [4, 0, 0, 4], mask = [False, True, True, False], fill_value = 999999) In [153]: numpy.vectorize(div)(x,y.filled(1)) Out[153]: masked_array(data = [0.25 2.0 -- 1.0], mask = [False False True False], fill_value = 999999) </code></pre> <p>The problem with that approach is that the filled values are listed as non-masked on the result, which is probably not what you want.</p> <h3>Better solution</h3> <p>Now, <code>div</code> was probably just an example, and you probably want more complex behavior for which there is not a 'no-op' argument. In this case, you can do as Numpy did for <code>log</code>, and avoid throwing an exception, instead returning a specific value. In this case, <code>numpy.ma.masked</code>. <code>div</code>'s implementation becomes this:</p> <pre class="prettyprint"><code>In [154]: def div(a,b): ...: try: ...: return a/b ...: except Exception as e: ...: warnings.warn (str(e)) ...: return numpy.ma.masked ...: ...: In [155]: numpy.vectorize(div)(x,y) /usr/bin/ipython:5: UserWarning: division by zero start_ipython() /usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813: UserWarning: Warning: converting a masked element to nan. res = array(outputs, copy=False, subok=True, dtype=otypes[0]) Out[155]: masked_array(data = [0.25 -- -- 1.0], mask = [False True True False], fill_value = 999999) </code></pre> <h3>More generic solution</h3> <p>But perhaps you already have the function and do not want to change it, or it is third-party. In that case, you could use a higher-order function:</p> <pre class="prettyprint"><code>In [164]: >>> def div(a,b): ...: return a/b ...: In [165]: def masked_instead_of_error (f): ...: def wrapper (*args, **kwargs): ...: try: ...: return f(*args, **kwargs) ...: except: ...: return numpy.ma.masked ...: return wrapper ...: In [166]: numpy.vectorize(masked_instead_of_error(div))(x,y) /usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813: UserWarning: Warning: converting a masked element to nan. res = array(outputs, copy=False, subok=True, dtype=otypes[0]) Out[166]: masked_array(data = [0.25 -- -- 1.0], mask = [False True True False], fill_value = 999999) </code></pre> <p>On the implementations above, using warnings might or might not be a good idea. You may also want to restrict the types of exceptions you'll be catching for returning <code>numpy.ma.masked</code>.</p> <p>Note also that <code>masked_instead_of_error</code> is ready to be used as a decorator for your functions, so you do not need to use it every time.</p>

Calling function on valid values of masked arrays

Tags:

python

numpy

I have two numpy masked arrays:

>>> x
masked_array(data = [1 2 -- 4],
             mask = [False False  True False],
       fill_value = 999999)
>>> y
masked_array(data = [4 -- 0 4],
             mask = [False  True False False],
       fill_value = 999999)

If I try to divide x by y, the division operation is not actually performed when one of the operands is masked, so I don't get a divide-by-zero error.

>>> x/y
masked_array(data = [0.25 -- -- 1.0],
             mask = [False  True  True False],
       fill_value = 1e+20)

This even works if I define my own division function, div:

>>> def div(a,b):
    return a/b

>>> div(x, y)
masked_array(data = [0.25 -- -- 1.0],
             mask = [False  True  True False],
       fill_value = 1e+20)

However, if I wrap my function with vectorize, the function is called on masked values and I get an error:

>>> np.vectorize(div)(x, y)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1811, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1880, in _vectorize_call
    outputs = ufunc(*inputs)
  File "<input>", line 2, in div
ZeroDivisionError: division by zero

Is there a way I can call a function with array arguments, and have the function only be executed when all of the arguments are unmasked?

763

asked Jul 27 '17 21:07

dbaston

1 Answers

The problem

Calling the function directly worked because, when you call div(x,y), div's arguments a and b become the MaskedArrays x and y, and the resulting code for a/b is x.__div__(y) (or __truediv__).

Now, since x is a MaskedArray, it has the intelligence to perform the division on another MaskedArray, following its rules.

However, when you vectorize it, your div function is not going to see any MaskedArrays, just scalars, a couple of ints in this case. So, when it tries a/b in the third items, it will be 'something' by zero, and you get the error.

MaskedArray's implementation seems to be based on re-implementing much of Numpy specifically for MaskedArrays. See, for example, that you have both numpy.log and numpy.ma.log. Compare running both of them on a MaskedArray that contains negative values. Both actually return a proper MaskedArray, but the plain numpy version also outputs some complains about dividing by zero:

In [116]: x = masked_array(data = [-1, 2, 0, 4],
     ...:              mask = [False, False,  True, False],
     ...:        fill_value = 999999)

In [117]: numpy.log(x)
/usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[117]: 
masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906],
             mask = [ True False  True False],
       fill_value = 999999)

In [118]: numpy.ma.log(x)
Out[118]: 
masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906],
             mask = [ True False  True False],
       fill_value = 999999)

If you run the numpy.log version on a plain list, it will return nan and inf for invalid values, not throw an error like the ZeroDivisionError you're getting.

In [138]: a = [1,-1,0]

In [139]: numpy.log(a)
/usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[139]: array([  0.,  nan, -inf])

Simpler solution

With that, I see two alternatives: first, for the simpler case you listed, you could replace the bad values by a no-op: 1 in div's case (note that the data is slightly different from yours, as there is a zero you didn't mark as masked):

x = masked_array(data = [1, 2, 0, 4],
             mask = [False, False,  True, False],
       fill_value = 999999)
y = masked_array(data = [4, 0, 0, 4],
             mask = [False,  True, True, False],
       fill_value = 999999)
In [153]: numpy.vectorize(div)(x,y.filled(1))
Out[153]: 
masked_array(data = [0.25 2.0 -- 1.0],
             mask = [False False  True False],
       fill_value = 999999)

The problem with that approach is that the filled values are listed as non-masked on the result, which is probably not what you want.

Better solution

Now, div was probably just an example, and you probably want more complex behavior for which there is not a 'no-op' argument. In this case, you can do as Numpy did for log, and avoid throwing an exception, instead returning a specific value. In this case, numpy.ma.masked. div's implementation becomes this:

In [154]: def div(a,b):
     ...:     try:
     ...:         return a/b
     ...:     except Exception as e:
     ...:         warnings.warn (str(e))
     ...:         return numpy.ma.masked
     ...:     
     ...:         

In [155]: numpy.vectorize(div)(x,y)
/usr/bin/ipython:5: UserWarning: division by zero
  start_ipython()
/usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813:     UserWarning: Warning: converting a masked element to nan.
  res = array(outputs, copy=False, subok=True, dtype=otypes[0])
Out[155]: 
masked_array(data = [0.25 -- -- 1.0],
             mask = [False  True  True False],
       fill_value = 999999)

More generic solution

But perhaps you already have the function and do not want to change it, or it is third-party. In that case, you could use a higher-order function:

In [164]: >>> def div(a,b):
     ...:     return a/b
     ...: 

In [165]: def masked_instead_of_error (f):
     ...:     def wrapper (*args, **kwargs):
     ...:         try:
     ...:             return f(*args, **kwargs)
     ...:         except:
     ...:             return numpy.ma.masked
     ...:     return wrapper
     ...:        

In [166]: numpy.vectorize(masked_instead_of_error(div))(x,y)
/usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813:             UserWarning: Warning: converting a masked element to nan.
  res = array(outputs, copy=False, subok=True, dtype=otypes[0])
Out[166]: 
masked_array(data = [0.25 -- -- 1.0],
             mask = [False  True  True False],
       fill_value = 999999)

On the implementations above, using warnings might or might not be a good idea. You may also want to restrict the types of exceptions you'll be catching for returning numpy.ma.masked.

Note also that masked_instead_of_error is ready to be used as a decorator for your functions, so you do not need to use it every time.

156

answered Oct 12 '22 14:10

caxcaxcoatl

Related questions
                            
                                Immutability in Python [duplicate]
                            
                                Which one is more secure to use? uuid, binascii.hexlify(os.urandom()) or random.SystemRandom()?
                            
                                Is overloading broken in cppclass Cython/C++ definitions?
                            
                                Reset default matplotlib colormap values after using 'set_under' or 'set_over'
                            
                                Virtualenv activate script won't run in bash script with set -euo
                            
                                Wordcloud Python with generate_from_frequencies
                            
                                Python to mysql 'Timestamp' object has no attribute 'translate'
                            
                                Logistic Regression: How to find top three feature that have highest weights?
                            
                                Python pandas load csv ANSI Format as UTF-8
                            
                                Python Strings are immutable so why does s.split( ) return a list of new strings
                            
                                Side Effects in Python
                            
                                How can I jump to the cell currently being run in a Jupyter notebook?
                            
                                Django - unavailable field of model while doing migration
                            
                                How to perform cluster with weights/density in python? Something like kmeans with weights?
                            
                                How to include library dependencies with a python project?
                            
                                How to use lazy_attribute with Faker in Factory Boy
                            
                                Tensorflow failed to create a newwriteablefile when retraining inception
                            
                                Why keywords in tcl and c pop up when to call tag completion for python?
                            
                                Python - Pandas - Convert YYYYMM to datetime
                            
                                Pandas: Remove Row Based on Applying Function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With