Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`numpy.mean` used with a tuple as `axis` argument: not working with a masked array

Tags:

python

numpy

I have one simple 3D array a1, and its masked analog a2:

import numpy

a1 = numpy.array([[[ 0.00,  0.00,  0.00],
                   [ 0.88,  0.80,  0.78],
                   [ 0.75,  0.78,  0.77]],

                  [[ 0.00,  0.00,  0.00],
                   [ 3.29,  3.29,  3.30],
                   [ 3.27,  3.27,  3.26]],

                  [[ 0.00,  0.00,  0.00],
                   [ 0.41,  0.42,  0.40],
                   [ 0.42,  0.43,  0.41]]])


a2 = numpy.ma.masked_equal(a1, 0.)

I want to perform the mean of this array along several axes at a time (this is a peculiar, undocumented use of axis argument in numpy.mean, see e.g. here for an example):

numpy.mean(a1, axis=(0, 1))

This is working fine with a1, but I get the following error with the masked array a2:

TypeError: tuple indices must be integers, not tuple

And I get the same error with the masked version numpy.ma.mean(a2, axis=(0, 1)), or if I unmask the array through a2[a2.mask]=0.

I am using a tuple for the axis argument in numpy.mean as it is actually not hardcoded (this command is applied on arrays with potenially different number of dimensions, according to which the tuple is adapted).

Problem encountered with numpy version 1.9.1 and 1.9.2.

like image 558
ztl Avatar asked May 13 '15 08:05

ztl


People also ask

How is NP mean () different from NP average () in Numpy?

np. mean always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result). np. average can compute a weighted average if the weights parameter is supplied.

What does axis mean in Numpy?

NumPy axes are the directions along the rows and columns. Just like coordinate systems, NumPy arrays also have axes. In a 2-dimensional NumPy array, the axes are the directions along the rows and columns.

What does masked mean in Python?

Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for numpy that supports data arrays with masks.


1 Answers

For a MaskedArray argument, numpy.mean calls MaskedArray.mean, which doesn't support a tuple axis argument. You can get the correct behavior by reimplementing MaskedArray.mean in terms of operations that do support tuples for axis:

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    counts = numpy.logical_not(a.mask).sum(axis=axis)
    if counts.shape:
        sums = a.filled(0).sum(axis=axis)
        mask = (counts == 0)
        return numpy.ma.MaskedArray(data=sums * 1. / counts, mask=mask, copy=False)
    elif counts:
        # Return scalar, not array
        return a.filled(0).sum(axis=axis) * 1. / counts
    else:
        # Masked scalar
        return numpy.ma.masked

or, if you're willing to rely on MaskedArray.sum working with a tuple axis (which you likely are, given that you're using undocumented behavior of numpy.mean),

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    sums = a2.sum(axis=axis)
    counts = numpy.logical_not(a.mask).sum(axis=axis)
    result = sums * 1. / counts

where we're relying on MaskedArray.sum to handle the mask.

I have only lightly tested these functions; before using them, make sure they actually work, and write some tests. For example, if the output is 0-dimensional and there are no masked values, whether the output is a 0D MaskedArray or a scalar depends on whether the input mask is nomask or an array of all False. This is the same as the default MaskedArray.mean behavior, but it may not be what you want; I suspect the default behavior is a bug.

like image 69
user2357112 supports Monica Avatar answered Sep 30 '22 18:09

user2357112 supports Monica