Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interleaving two numpy index arrays, one item from each array

I have two ordered numpy arrays and I want to interleave them so that I take one item from the first array, then another from the second, then back to the first - taking the next item that is larger than the one I just took from the second and so on. Those are actually arrays of indices to other arrays, and I'll be ok with operating on the original arrays as long as the operation is vectorized (but of course working on the index array as a vector operation will be awesome).

Example (ok to assume that the intersection of arrays is empty)

a = array([1,2,3,4,7,8,9,10,17])
b = array([5,6,13,14,15,19,21,23])

I would like to get [1,5,7,13,17,19]

like image 771
nickb Avatar asked Aug 06 '12 11:08

nickb


2 Answers

Vectorised solution (pedagogical style, easily understandable)

We can vectorise this by augmenting the arrays with a discriminator index, such that a is tagged 0 and b is tagged 1:

a_t = np.vstack((a, np.zeros_like(a)))
b_t = np.vstack((b, np.ones_like(b)))

Now, let's combine and sort:

c = np.hstack((a_t, b_t))[:, np.argsort(np.hstack((a, b)))]
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 13, 14, 15, 17, 19, 21, 23],
       [ 0,  0,  0,  0,  1,  1,  0,  0,  0,  0,  1,  1,  1,  0,  1,  1,  1]])

You can see that now the elements are in order but retaining their tags, so we can see which elements came from a and from b.

So, let's select the first element and each element where the tag changes from 0 (for a) to 1 (for b) and back again:

c[:, np.concatenate(([True], c[1, 1:] != c[1, :-1]))][0]
array([ 1,  5,  7, 13, 17, 19])

Efficient vectorised solution

You can do this slightly more efficiently by keeping the items and their tags in separate (but parallel) arrays:

ab = np.hstack((a, b))
s = np.argsort(ab)
t = np.hstack((np.zeros_like(a), np.ones_like(b)))[s]
ab[s][np.concatenate(([True], t[1:] != t[:-1]))]
array([ 1,  5,  7, 13, 17, 19])

This is slightly more efficient than the above solution; I get an average of 45 as opposed to 90 microseconds, although your conditions may vary.

like image 52
ecatmur Avatar answered Sep 30 '22 14:09

ecatmur


You could break the problem into two parts:

  1. Make a and b iterators which yield values only if they are bigger than some threshold.
  2. Use itertools (essentially George Sakkis' roundrobin recipe) to alternate between the two iterators.

import itertools
import numpy as np

a = np.array([1,2,3,4,7,8,9,10,17])
b = np.array([5,6,13,14,15,19,21,23])

def takelarger(iterable):
    for x in iterable:
        if x > takelarger.threshold:
            takelarger.threshold = x
            yield x
takelarger.threshold = 0

def alternate(*iterables):
    # Modified version of George Sakkis' roundrobin recipe
    # http://docs.python.org/library/itertools.html#recipes
    pending = len(iterables)
    nexts = itertools.cycle(iter(it).next for it in iterables)
    while pending:
        try:
            for n in nexts:
                yield n()
        except StopIteration:
            # Unlike roundrobin, stop when any iterable has been consumed
            break

def interleave(a, b):
    takelarger.threshold = 0
    return list(alternate(takelarger(a),takelarger(b)))

print(interleave(a,b))

yields

[1, 5, 7, 13, 17, 19]
like image 28
unutbu Avatar answered Sep 30 '22 16:09

unutbu