Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy histogram on multi-dimensional array

given an np.array of shape (n_days, n_lat, n_lon), I'd like to compute a histogram with fixed bins for each lat-lon cell (ie the distribution of daily values).

A simple solution to the problem is to loop over the cells and invoke np.histogram for each cell::

bins = np.linspace(0, 1.0, 10)
B = np.rand(n_days, n_lat, n_lon)
H = np.zeros((n_bins, n_lat, n_lon), dtype=np.int32)
for lat in range(n_lat):
    for lon in range(n_lon):
        H[:, lat, lon] = np.histogram(A[:, lat, lon], bins=bins)[0]
# note: code not tested

but this is quite slow. Is there a more efficient solution that does not involve a loop?

I looked into np.searchsorted to get the bin indices for each value in B and then use fancy indexing to update H::

bin_indices = bins.searchsorted(B)
H[bin_indices.ravel(), idx[0], idx[1]] += 1  # where idx is a index grid given by np.indices
# note: code not tested

but this does not work because the in-place add operator (+=) doesn't seem to support multiple updates of the same cell.

thx, Peter

like image 827
Peter Prettenhofer Avatar asked Sep 17 '13 13:09

Peter Prettenhofer


People also ask

What is histogram2d in NumPy?

Numpy histogram2d () function computes the two-dimensional histogram two data sample sets. The syntax of numpy histogram2d () is given as: numpy.histogram2d (x, y, bins=10, range=None, normed=None, weights=None, density=None). Where, x and y are arrays containing x and y coordinates to be histogrammed, respectively.

How is a histogram computed in Python?

The histogram is computed over the flattened array. If bins is an int, it defines the number of equal-width bins in the given range (10, by default).

How to create a NumPy array with multiple dimensions?

In general numpy arrays can have more than one dimension. One way to create such array is to start with a 1-dimensional array and use the numpy reshape () function that rearranges elements of that array into a new shape. b is a 2-dimensional array, with 2 rows and 3 columns. We can access its elements by specifying row and column indexes:

How do I make a histogram with two dimensions?

An array containing the y coordinates of the points to be histogrammed. If int, the number of bins for the two dimensions (nx=ny=bins). If array_like, the bin edges for the two dimensions (x_edges=y_edges=bins). If [int, int], the number of bins in each dimension (nx, ny = bins).


1 Answers

You can use numpy.apply_along_axis to eliminate the loop.

hist, bin_edges = apply_along_axis(lambda x: histogram(x, bins=bins), 0, B)
like image 66
Greg Whittier Avatar answered Sep 20 '22 09:09

Greg Whittier