Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.searchsorted with more than one source

Tags:

python

numpy

Let's say that I have two arrays in the form

a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]

As you can see, the above arrays are sorted, when considered a and b as columns of a super array.

Now, I want to do a searchsorted on this array. For instance, if I search for (3, 7) (a = 3 and b = 7), I should get 6.

Whenever there are duplicate values in a, the search should continue with values in b.

Is there a built-in numpy method to do it? Or what could be the efficient way to do it, assuming that I have million entries in my array.

I tried with numpy.recarray, to create one recarray with a and b and tried searching in it, but I am getting the following error.

TypeError: expected a readable buffer object

Any help is much appreciated.

like image 273
Senthil Babu Avatar asked Feb 20 '23 14:02

Senthil Babu


1 Answers

You're almost there. It's just that numpy.record (which is what I assume you used, given the error message you received) isn't really what you want; just create a one-item record array:

>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
       (4, 4), (4, 8), (5, 1), (6, 1)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> numpy.searchsorted(a_b, numpy.array((3, 7), dtype=a_b.dtype))
6

It might also be useful to know that sort and argsort sort record arrays lexically, and there is also lexsort. An example using lexsort:

>>> random_idx = numpy.random.permutation(range(12))
>>> a = numpy.array(a)[random_idx]
>>> b = numpy.array(b)[random_idx]
>>> sorted_idx = numpy.lexsort((b, a))
>>> a[sorted_idx]
array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b[sorted_idx]
array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])

Sorting record arrays:

>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b[a_b.argsort()]
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
       (4, 4), (4, 8), (5, 1), (6, 1)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> a_b.sort()
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
       (4, 4), (4, 8), (5, 1), (6, 1)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])
like image 63
senderle Avatar answered Feb 22 '23 04:02

senderle