Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Feeling stupid while trying to implement lazy partitioning in Python

I am trying to implement lazy partitioning of an iterator object that yields slices of the iterator when a function on an element of the iterator changes values. This would mimick the behavior of Clojure's partition-by (although the semantics of the output would be different, since Python would genuinely "consume" the elements). My implementation is optimal in the number of operations it performs but not in memory it requires. I don't see why a good implementation would need more than O(1) memory, but my implementation takes up O(k) memory, where k is the size of the partition. I would like to be able to handle cases where k is large. Does anyone know of a good implementation?

The correct behavior should be something like

>>>unagi = [-1, 3, 4, 7, -2, 1, -3, -5]
>>> parts = partitionby(lambda x: x < 0,unagi)
>>> print [[y for y in x] for x in parts]
[[-1], [3, 4, 7], [-2], [1], [-3, -5]]

Here is my current version

from itertools import *

def partitionby(f,iterable):
    seq = iter(iterable)
    current = next(seq)
    justseen = next(seq)
    partition = iter([current])
    while True:
        if f(current) == f(justseen): 
            partition = chain(partition,iter([justseen]))
            try:
                justseen = next(seq)
            except StopIteration:
                yield partition
                break
        else:
            yield partition
            current = justseen
            partition = iter([])
like image 673
Gabriel Mitchell Avatar asked Apr 29 '11 18:04

Gabriel Mitchell


1 Answers

Why not reuse groupby? I think it is O(1).

def partitionby(f, iterable):
    return (g[1] for g in groupby(iterable, f))

The difference of groupby's implementation with yours is that the partition is a specialized iterator object, instead of a chain of chain of chain ...

like image 159
kennytm Avatar answered Nov 01 '22 20:11

kennytm