Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check which interval index a value is

Tags:

python

numpy

I have a vector like this:

intervals = [6, 7, 8, 9, 10, 11] #always regular

I want to check which interval index a value is. For example: the index of the interval where 8.5 is, is 3.

#Interval : index
6  -> 7   : 1
7  -> 8   : 2
8  -> 9   : 3
9  -> 10  : 4
10 -> 11  : 5

So I made this code:

from numpy import *
N = 8000
data = random.random(N)
step_number = 50
max_value = max(data)
min_value = min(data)
step_length = (max_value - min_value)/step_number
intervals = arange(min_value + step_length, max_value + step_length, step_length )
for x in data:
    for index in range(len(intervals)):
        if x < intervals[index]:
            print("That's the index", index)
            break

This code is working, but It's tooo slow, I think I'm wasting time in these loops. Is there a way to check this faster? Maybe using some numpy special function that check this for me...

like image 251
João Paulo Avatar asked Jan 14 '16 19:01

João Paulo


4 Answers

Using numpy.digitize:

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.digitize.html#numpy-digitize

>>> import numpy as np
>>> intervals = [6, 7, 8, 9, 10, 11]
>>> data = [3.5, 6.3, 9.4, 11.5, 8.5]

>>> np.digitize(data, bins=interval)
array([0, 1, 4, 6, 3])

0 is underflow, len(intervals) is overflow

like image 200
MaxNoe Avatar answered Oct 07 '22 21:10

MaxNoe


Depending on how you want to handle the endpoints, there is bisect.bisect_left and bisect.bisect_right:

>>> import bisect
>>> intervals = [6, 7, 8, 9, 10, 11]
>>> for n in (6, 6.1, 6.2, 6.5, 6.8, 7):
...   print bisect.bisect_left(intervals, n)
... 
0
1
1
1
1
1
>>> for n in (6, 6.1, 6.2, 6.5, 6.8, 7):
...   print bisect.bisect_right(intervals, n)
... 
1
1
1
1
1
2

Numpy implements the same thing using the searchsorted method.

>>> import numpy as np
>>> np.searchsorted(intervals, (6, 6.1, 6.2, 6.5, 6.8, 7), side='left')
array([0, 1, 1, 1, 1, 1])
>>> np.searchsorted(intervals, (6, 6.1, 6.2, 6.5, 6.8, 7), side='right')
array([1, 1, 1, 1, 1, 2])

And, of course, if your intervals are equally spaced, you can do:

>>> for n in (6, 6.1, 6.2, 6.5, 6.8, 7):
...     iwidth = intervals[1] - intervals[0]
...     print np.ceil((n - intervals[0]) / iwidth)
... 
0.0
1.0
1.0
1.0
1.0
1.0
like image 9
mgilson Avatar answered Oct 19 '22 09:10

mgilson


As others have mentioned, if you have irregular intervals, use a bisecting search (e.g. np.searchsorted and/or np.digitize).

However, in your specific case where you've stated you'll always have regular intervals, you can also do something similar to:

import numpy as np

intervals = [6, 7, 8, 9, 10, 11]
vals = np.array([8.5, 6.2, 9.8])

dx = intervals[1] - intervals[0]
x0 = intervals[0]

i = np.ceil((vals - x0) / dx).astype(int)

Or, building on your example code:

import numpy as np

N = 8000
num_intervals = 50

data = np.random.random(N)
intervals = np.linspace(data.min(), data.max(), num_intervals)

x0 = intervals[0]
dx = intervals[1] - intervals[0]
i = np.ceil((data - x0) / dx).astype(int)

This will be much faster than a binary search for large arrays.

like image 4
Joe Kington Avatar answered Oct 19 '22 11:10

Joe Kington


As long as your list is sorted, you can use the bisect library to get the insertion index.

index = bisect.bisect_left(intervals, 8.5)
like image 1
Brendan Abel Avatar answered Oct 19 '22 10:10

Brendan Abel