I am using the following code to digitize an array into 16 bins:
numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])
I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?
This is actually documented behaviour of numpy.digitize()
:
Each index
i
returned is such thatbins[i-1] <= x < bins[i]
ifbins
is monotonically increasing, orbins[i-1] > x >= bins[i]
ifbins
is monotonically decreasing. If values inx
are beyond the bounds ofbins
,0
orlen(bins)
is returned as appropriate.
So in your case, 0
and 17
are also valid return values (note that the bin array returned by numpy.histogram()
has length 17
). The bins returned by numpy.histogram()
cover the range array.min()
to array.max()
. The condition given in the docs shows that array.min()
belongs to the first bin, while array.max()
lies outside the last bin -- that's why 0
is not in the output, while 17 is.
numpy.histogram()
produces an array of the bin edges, of which there are (number of bins)+1
.
In numpy version 1.8.,you have an option to select whether you want numpy.digitize to consider the interval to be closed or open. Following is an example (copied from http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html)
x = np.array([1.2, 10.0, 12.4, 15.5, 20.])
bins = np.array([0,5,10,15,20])
np.digitize(x,bins,right=True)
array([1, 2, 3, 4, 4])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With