I have a large two dimensional array arr
which I would like to bin over the second axis using numpy. Because np.histogram
flattens the array I'm currently using a for loop:
import numpy as np
arr = np.random.randn(100, 100)
nbins = 10
binned = np.empty((arr.shape[0], nbins))
for i in range(arr.shape[0]):
binned[i,:] = np.histogram(arr[i,:], bins=nbins)[0]
I feel like there should be a more direct and more efficient way to do that within numpy but I failed to find one.
Axis=1 Row-Wise Operation Setting the axis=1 when performing an operation on a NumPy array will perform the operation row-wise, that is across all columns for each row. We expect a sum row-wise with axis=1 will result in two values, one for each row, as follows: Row 1: 1 + 2 + 3 = 6. Row 2: 4 + 5 + 6 = 15.
Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. Smoothing by bin median : In this method each bin value is replaced by its bin median value.
Stacking is the concept of joining arrays in NumPy. Arrays having the same dimensions can be stacked. The stacking is done along a new axis. Stacking leads to increased customization of arrays. We can combine the stack function with other functions to further increase its capabilities.
The histogram is computed over the flattened array. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
You could use np.apply_along_axis
:
x = np.array([range(20), range(1, 21), range(2, 22)])
nbins = 2
>>> np.apply_along_axis(lambda a: np.histogram(a, bins=nbins)[0], 1, x)
array([[10, 10],
[10, 10],
[10, 10]])
The main advantage (if any) is that it's slightly shorter, but I wouldn't expect much of a performance gain. It's possibly marginally more efficient in the assembly of the per-row results.
I was a bit confused by the lambda in Ami's solution so I expanded it out to show what it's doing:
def hist_1d(a):
return np.histogram(a, bins=bins)[0]
counts = np.apply_along_axis(hist_1d, axis=1, arr=x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With