I noticed that the numpy masked-array mean method returns different types when it probably should not:
import numpy as np
A = np.ma.masked_equal([1,1,0], value=0)
B = np.ma.masked_equal([1,1,1], value=0) # no masked values
type(A.mean())
#numpy.float64
type(B.mean())
#numpy.ma.core.MaskedArray
Other numpy.ma.core.MaskedArray
methods seem to be consistent
type( A.sum()) == type(B.sum())
# True
type( A.prod()) == type(B.prod())
# True
type( A.std()) == type(B.std())
# True
type( A.mean()) == type(B.mean())
# False
Can someone explain this?
UPDATE: As pointed out in the comments
C = np.ma.masked_array([1, 1, 1], mask=[False, False, False])
type(C.mean()) == type(A.mean())
# True
B.mask
starts with:
if self._mask is nomask:
result = super(MaskedArray, self).mean(axis=axis, dtype=dtype)
np.ma.nomask
is False
.
This is the case for your B
:
masked_array(data = [1 1 1],
mask = False,
fill_value = 0)
For A
the mask is an array that matches the data
in size. In B
it is a scalar, False
, and mean
is handling that as a special case.
I need to dig a bit more to see what this implies.
In [127]: np.mean(B)
Out[127]:
masked_array(data = 1.0,
mask = False,
fill_value = 0)
In [141]: super(np.ma.MaskedArray,B).mean()
Out[141]:
masked_array(data = 1.0,
mask = False,
fill_value = 0)
I'm not sure that helps; there's some circular referencing between np.ndarray
methods and the np
function and the np.ma
methods, that makes it hard to identify exactly what code is being used. It like it is using the compiled mean
method, but it isn't obvious how that handles the masking.
I wonder if the intent is to use
np.mean(B.data) # or
B.data.mean()
and the super
method fetch isn't the right approach.
In any case, the same array, but with a vector mask returns the scalar.
In [132]: C
Out[132]:
masked_array(data = [1 1 1],
mask = [False False False],
fill_value = 0)
In [133]: C.mean()
Out[133]: 1.0
====================
Trying this method without the nomask
shortcut, raises an error after
dsum = self.sum(axis=axis, dtype=dtype)
cnt = self.count(axis=axis)
if cnt.shape == () and (cnt == 0):
result = masked
else:
result = dsum * 1. / cnt
self.count
returns a scalar in the nomask
case, but a np.int32
in the regular masking. So the cnt.shape
chokes.
trace
is the only other masked method that tries this super(MaskedArray...)
'shortcut'. There's clearly something kludgy about the mean code.
====================
Relevant bug issue: https://github.com/numpy/numpy/issues/5769
According to that the same question was raised here last year: Testing equivalence of means of Numpy MaskedArray instances raises attribute error
Looks like there are a lot of masking issues, not just with mean
. There may be fixes in the development master now, or in the near future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With