Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.digitize returns values out of range?

I am using the following code to digitize an array into 16 bins:

numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])

I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?

like image 913
sandesh247 Avatar asked Dec 04 '10 18:12

sandesh247


3 Answers

This is actually documented behaviour of numpy.digitize():

Each index i returned is such that bins[i-1] <= x < bins[i] if bins is monotonically increasing, or bins[i-1] > x >= bins[i] if bins is monotonically decreasing. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate.

So in your case, 0 and 17 are also valid return values (note that the bin array returned by numpy.histogram() has length 17). The bins returned by numpy.histogram() cover the range array.min() to array.max(). The condition given in the docs shows that array.min() belongs to the first bin, while array.max() lies outside the last bin -- that's why 0 is not in the output, while 17 is.

like image 144
Sven Marnach Avatar answered Nov 14 '22 17:11

Sven Marnach


numpy.histogram() produces an array of the bin edges, of which there are (number of bins)+1.

like image 28
Andrew Jaffe Avatar answered Nov 14 '22 18:11

Andrew Jaffe


In numpy version 1.8.,you have an option to select whether you want numpy.digitize to consider the interval to be closed or open. Following is an example (copied from http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html)

x = np.array([1.2, 10.0, 12.4, 15.5, 20.])

bins = np.array([0,5,10,15,20])

np.digitize(x,bins,right=True)

array([1, 2, 3, 4, 4])

like image 2
sravani vaddi Avatar answered Nov 14 '22 18:11

sravani vaddi