Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binning of data along one axis in numpy

I have a large two dimensional array arr which I would like to bin over the second axis using numpy. Because np.histogram flattens the array I'm currently using a for loop:

import numpy as np

arr = np.random.randn(100, 100)

nbins = 10
binned = np.empty((arr.shape[0], nbins))

for i in range(arr.shape[0]):
    binned[i,:] = np.histogram(arr[i,:], bins=nbins)[0]

I feel like there should be a more direct and more efficient way to do that within numpy but I failed to find one.

like image 835
obachtos Avatar asked Oct 13 '16 10:10

obachtos


People also ask

What does axis 1 do in NumPy?

Axis=1 Row-Wise Operation Setting the axis=1 when performing an operation on a NumPy array will perform the operation row-wise, that is across all columns for each row. We expect a sum row-wise with axis=1 will result in two values, one for each row, as follows: Row 1: 1 + 2 + 3 = 6. Row 2: 4 + 5 + 6 = 15.

How do you binning data in Python?

Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. Smoothing by bin median : In this method each bin value is replaced by its bin median value.

What is NumPy stacking?

Stacking is the concept of joining arrays in NumPy. Arrays having the same dimensions can be stacked. The stacking is done along a new axis. Stacking leads to increased customization of arrays. We can combine the stack function with other functions to further increase its capabilities.

What is bins in NP histogram?

The histogram is computed over the flattened array. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.


2 Answers

You could use np.apply_along_axis:

x = np.array([range(20), range(1, 21), range(2, 22)])

nbins = 2
>>> np.apply_along_axis(lambda a: np.histogram(a, bins=nbins)[0], 1, x)
array([[10, 10],
       [10, 10],
       [10, 10]])

The main advantage (if any) is that it's slightly shorter, but I wouldn't expect much of a performance gain. It's possibly marginally more efficient in the assembly of the per-row results.

like image 107
Ami Tavory Avatar answered Oct 21 '22 00:10

Ami Tavory


I was a bit confused by the lambda in Ami's solution so I expanded it out to show what it's doing:

def hist_1d(a):
    return np.histogram(a, bins=bins)[0]

counts = np.apply_along_axis(hist_1d, axis=1, arr=x)
like image 30
ThomasNicholas Avatar answered Oct 21 '22 01:10

ThomasNicholas