I have a vector like this:
intervals = [6, 7, 8, 9, 10, 11] #always regular
I want to check which interval index a value is. For example: the index of the interval where 8.5
is, is 3
.
#Interval : index
6 -> 7 : 1
7 -> 8 : 2
8 -> 9 : 3
9 -> 10 : 4
10 -> 11 : 5
So I made this code:
from numpy import *
N = 8000
data = random.random(N)
step_number = 50
max_value = max(data)
min_value = min(data)
step_length = (max_value - min_value)/step_number
intervals = arange(min_value + step_length, max_value + step_length, step_length )
for x in data:
for index in range(len(intervals)):
if x < intervals[index]:
print("That's the index", index)
break
This code is working, but It's tooo slow, I think I'm wasting time in these loops. Is there a way to check this faster? Maybe using some numpy special function that check this for me...
Using numpy.digitize
:
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.digitize.html#numpy-digitize
>>> import numpy as np
>>> intervals = [6, 7, 8, 9, 10, 11]
>>> data = [3.5, 6.3, 9.4, 11.5, 8.5]
>>> np.digitize(data, bins=interval)
array([0, 1, 4, 6, 3])
0
is underflow, len(intervals)
is overflow
Depending on how you want to handle the endpoints, there is bisect.bisect_left
and bisect.bisect_right
:
>>> import bisect
>>> intervals = [6, 7, 8, 9, 10, 11]
>>> for n in (6, 6.1, 6.2, 6.5, 6.8, 7):
... print bisect.bisect_left(intervals, n)
...
0
1
1
1
1
1
>>> for n in (6, 6.1, 6.2, 6.5, 6.8, 7):
... print bisect.bisect_right(intervals, n)
...
1
1
1
1
1
2
Numpy implements the same thing using the searchsorted
method.
>>> import numpy as np
>>> np.searchsorted(intervals, (6, 6.1, 6.2, 6.5, 6.8, 7), side='left')
array([0, 1, 1, 1, 1, 1])
>>> np.searchsorted(intervals, (6, 6.1, 6.2, 6.5, 6.8, 7), side='right')
array([1, 1, 1, 1, 1, 2])
And, of course, if your intervals are equally spaced, you can do:
>>> for n in (6, 6.1, 6.2, 6.5, 6.8, 7):
... iwidth = intervals[1] - intervals[0]
... print np.ceil((n - intervals[0]) / iwidth)
...
0.0
1.0
1.0
1.0
1.0
1.0
As others have mentioned, if you have irregular intervals, use a bisecting search (e.g. np.searchsorted
and/or np.digitize
).
However, in your specific case where you've stated you'll always have regular intervals, you can also do something similar to:
import numpy as np
intervals = [6, 7, 8, 9, 10, 11]
vals = np.array([8.5, 6.2, 9.8])
dx = intervals[1] - intervals[0]
x0 = intervals[0]
i = np.ceil((vals - x0) / dx).astype(int)
Or, building on your example code:
import numpy as np
N = 8000
num_intervals = 50
data = np.random.random(N)
intervals = np.linspace(data.min(), data.max(), num_intervals)
x0 = intervals[0]
dx = intervals[1] - intervals[0]
i = np.ceil((data - x0) / dx).astype(int)
This will be much faster than a binary search for large arrays.
As long as your list is sorted, you can use the bisect library to get the insertion index.
index = bisect.bisect_left(intervals, 8.5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With