I have two-dimensional data and I have a bunch of two-dimensional bins generated with scipy.stats.binned_statistic_2d
. For each data point, I want the index of the bin it occupies. This is exactly what np.digitize
is for, but as far as I can tell, it only deals with one-dimensional data. This stackexchange seems to have an answer, but that is totally generalized to n-dimensions. Is there a more straightforward solution for two dimensions?
With the help of np. digitize() method, we can get the indices of the bins to which the each value is belongs to an array by using np. digitize() method. Syntax : np.digitize(Array, Bin, Right) Return : Return an array of indices of the bins.
You can already get the bin index of each observation from the fourth return variable of scipy.stats.binned_statistic_2d
:
Returns: statistic : (nx, ny) ndarray The values of the selected statistic in each two-dimensional bin. xedges : (nx + 1) ndarray The bin edges along the first dimension. yedges : (ny + 1) ndarray The bin edges along the second dimension. binnumber : (N,) array of ints or (2,N) ndarray of ints This assigns to each element of sample an integer that represents the bin in which this observation falls. The representation depends on the expand_binnumbers argument. See Notes for details.
a simple solution using numpy:
bins = [[0.3, 0.5, 0.7], [0.3, 0.7]]
values = np.random.random((10, 2))
digitized = []
for i in range(len(bins)):
digitized.append(np.digitize(values[:, i], bins[i], right=False))
digitized = np.concatenate(digitized).reshape(10, 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With