Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interleaving multiple iterables randomly while preserving their order in python

Tags:

python

Inspired by this earlier stack overflow question I have been considering how to randomly interleave iterables in python while preserving the order of elements within each iterable. For example:

>>> def interleave(*iterables):
...     "Return the source iterables randomly interleaved"
...     <insert magic here>
>>> interleave(xrange(1, 5), xrange(5, 10), xrange(10, 15))
[1, 5, 10, 11, 2, 6, 3, 12, 4, 13, 7, 14, 8, 9]

The original question asked to randomly interleave two lists, a and b, and the accepted solution was:

>>> c = [x.pop(0) for x in random.sample([a]*len(a) + [b]*len(b), len(a)+len(b))]

However, this solution works for only two lists (though it can easily be extended) and relies on the fact that a and b are lists so that pop() and len() can be called on them, meaning it cannot be used with iterables. It also has the unfortunate side effect of emptying the source lists a and b.

Alternate answers given for the original question take copies of the source lists to avoid modifying them, but this strikes me as inefficient, especially if the source lists are sizeable. The alternate answers also make use of len() and therefore cannot be used on mere iterables.

I wrote my own solution that works for any number of input lists and doesn't modify them:

def interleave(*args):
    iters = [i for i, b in ((iter(a), a) for a in args) for _ in xrange(len(b))]
    random.shuffle(iters)
    return map(next, iters)

but this solution also relies on the source arguments being lists so that len() can be used on them.

So, is there an efficient way to randomly interleave iterables in python, preserving the original order of elements, which doesn't require knowledge of the length of the iterables ahead of time and doesn't take copies of the iterables?

Edit: Please note that, as with the original question, I don't need the randomisation to be fair.

like image 680
srgerg Avatar asked May 18 '12 07:05

srgerg


2 Answers

Here is one way to do it using a generator:

import random

def interleave(*args):
  iters = map(iter, args)
  while iters:
    it = random.choice(iters)
    try:
      yield next(it)
    except StopIteration:
      iters.remove(it)

print list(interleave(xrange(1, 5), xrange(5, 10), xrange(10, 15)))
like image 55
NPE Avatar answered Nov 10 '22 00:11

NPE


Not if you want fit to be "fair".

Imagine you have a list containing one million items and another containing just two items. A "fair" randomization would have the first element from the short list occurring at about index 300000 or so.

a,a,a,a,a,a,a,...,a,a,a,b,a,a,a,....
                        ^

But there's no way to know in advance until you know the length of the lists.

If you just take from each list with 50% (1/n) probability then it can be done without knowing the lengths of the lists but you'll get something more like this:

a,a,b,a,b,a,a,a,a,a,a,a,a,a,a,a,...
    ^   ^
like image 3
Mark Byers Avatar answered Nov 09 '22 23:11

Mark Byers