Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create the histogram of an array with masked values, in Numpy?

In Numpy 1.4.1, what is the simplest or most efficient way of calculating the histogram of a masked array? numpy.histogram and pyplot.hist do count the masked elements, by default!

The only simple solution I can think of right now involves creating a new array with the non-masked value:

histogram(m_arr[~m_arr.mask])

This is not very efficient, though, as this unnecessarily creates a new array. I'd be happy to read about better ideas!

like image 500
Eric O Lebigot Avatar asked Aug 31 '10 14:08

Eric O Lebigot


People also ask

What is a masked array Numpy?

A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.

How do you plot an array from a histogram in Python?

Graphical representationThe plt() function present in pyplot submodule of Matplotlib takes the array of dataset and array of bin as parameter and creates a histogram of the corresponding data values.

What are the values returned by NP histogram ()?

The Numpy histogram function doesn't draw the histogram, but it computes the occurrences of input data that fall within each bin, which in turns determines the area (not necessarily the height if the bins aren't of equal width) of each bar. There are 3 bins, for values ranging from 0 to 1 (excl 1.), 1 to 2 (excl.


1 Answers

(Undeleting this as per discussion above...)

I'm not sure whether or not the numpy developers would consider this a bug or expected behavior. I asked on the mailing list, so I guess we'll see what they say.

Either way, it's an easy fix. Patching numpy/lib/function_base.py to use numpy.asanyarray rather than numpy.asarray on the inputs to the function will allow it to properly use masked arrays (or any other subclass of an ndarray) without creating a copy.

Edit: It seems like it is expected behavior. As discussed here:

If you want to ignore masked data it's just on extra function call

histogram(m_arr.compressed())

I don't think the fact that this makes an extra copy will be relevant, because I guess full masked array handling inside histogram will be a lot more expensive.

Using asanyarray would also allow matrices in and other subtypes that might not be handled correctly by the histogram calculations.

For anything else besides dropping masked observations, it would be necessary to figure out what the masked array definition of a histogram is, as Bruce pointed out.

like image 133
Joe Kington Avatar answered Oct 18 '22 13:10

Joe Kington