Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overflow warnings when performing multiply on numpy masked arrays

Tags:

numpy

I have an application that is reading 32-bit floating point data from a netcdf file in which the default netcdf fill value is being used, i.e. 9.96920996839e+36. At a particular point in the app a basic scaling (multiply) operation is performed on the float32-type masked array created from the input data, thus:

x = marr * scale   # or, equivalently, x = ma.multiply(marr,scale)

This operation throws 'overflow encountered in multiply' warnings, presumably because the product of the fill value and scale exceed the maximum value of a 32-bit float. The other values in the masked array are known to be small. The question then is, why is numpy even computing the product for masked elements in the input array? Surely these should simply be ignored, right?

As it happens, the warning can be silently ignored, since the corresponding values in the output array are still flagged as masked. But it would be interesting to know if this is a bug in numpy or 'working as designed'.

The code fragment below illustrates this behaviour.

import numpy as np
import numpy.ma as ma
arr = [9.96920996839e+36, 1.123, 2.345, 9.96920996839e+36]
marr = ma.masked_values(np.array(arr, dtype='float32'), 9.96920996839e+36)
x = marr * 128.0

As might be expected, the overflow warning does not appear if the masked array is of type float64 (though presumably it would if the scale factor was sufficiently large). Similarly the warning disappears if a smaller fill value, e.g. -1.0e20, is used in the float32 case.

On the face of it, it would seem that numpy is unable to identify masked values when the larger fill value is used (which is very close to the maximum value for a 32-bit f.p. value).

TIA,
Phil

like image 330
Phil Avatar asked Oct 06 '22 14:10

Phil


1 Answers

The question then is, why is numpy even computing the product for masked elements in the input array? Surely these should simply be ignored, right?

Alas, no. In the current implementation, any operation is applied on the whole array, then the mask is reapplied.

I know it sounds counterproductive, but it was the more robust and less inefficient alternative to other approaches. Initially, it would have great to apply the operation only on the appropriate domain, but the computation of that domain could get quite tricky (there were some huge problems with pow). Moreover, the extra tests would have crashed the already pitiful performances.

A new method has been recently introduced where numpy functions accept an optional argument where, that could help with that... But there are also talks about introducing support for missing/ignored values directly at the C level, which is probably be the way to go.

like image 63
Pierre GM Avatar answered Oct 11 '22 00:10

Pierre GM