Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

itertools or hand-written generator - what is preferable?

I have a number of Python generators, which I want to combine into a new generator. I can easily do this by a hand-written generator using a bunch of yield statements.

On the other hand, the itertools module is made for things like this and to me it seems as if the pythonic way to create the generator I need is to plug together various iterators of that itertools module.

However, in the problem at hand, it soon gets quite complicated (the generator needs to maintain a sort of state --- e.g. whether the first or later items are being processed ---, the i-th output further depends on conditions on the i-th input items and the various input lists have to be processed differently before they are being joined to the generated list.

As the composition of standard iterators that would solve my problem is --- due to the one-dimensional nature of writing down source code --- nearly incomprehensible, I wonder whether there are any advantages of using standard itertools generators versus hand-written generator functions (in basic and in more advanced cases). Actually, I think that in 90% of the cases, the hand-written versions are much easier to read --- probably due to their more imperative style compared to the functional style of chaining iterators.

EDIT

In order to illustrate my problem, here is a (toy) example: Let a and b be two iterables of the same length (the input data). The items of a consist of integers, the items of b are iterables themselves, whose individual items are strings. The output should correspond to the output of the following generator function:

from itertools import *
def generator(a, b):
    first = True
    for i, s in izip(a, b):
        if first:
            yield "First line"
            first = False
        else:
            yield "Some later line"
        if i == 0:
            yield "The parameter vanishes."
        else:
            yield "The parameter is:"
            yield i
        yield "The strings are:"
        comma = False
        for t in s:
            if comma:
                yield ','
            else:
                comma = True
            yield t

If I write down the same program in functional style using generator expressions and the itertools module, I end up with something like:

from itertools import *
def generator2(a, b):
    return (z for i, s, c in izip(a, b, count())
            for y in (("First line" if c == 0 else "Some later line",),
                      ("The parameter vanishes.",) if i == 0
                      else ("The parameter is:", i),
                      ("The strings are:",),
                      islice((x for t in s for x in (',', t)), 1, None))
            for z in y)

EXAMPLE

>>> a = (1, 0, 2), ("ab", "cd", "ef")
>>> print([x for x in generator(a, b)])
['First line', 'The parameter is:', 1, 'The strings are:', 'a', ',', 'b', 'Some later line', 'The parameter vanishes.', 'The strings are:', 'c', ',', 'd', 'Some later line', 'The parameter is:', 2, 'The strings are:', 'e', ',', 'f']
>>> print([x for x in generator2(a, b)])
['First line', 'The parameter is:', 1, 'The strings are:', 'a', ',', 'b', 'Some later line', 'The parameter vanishes.', 'The strings are:', 'c', ',', 'd', 'Some later line', 'The parameter is:', 2, 'The strings are:', 'e', ',', 'f']

This is possibly more elegant than my first solution but it looks like a write-once-do-not-understand-later piece of code. I am wondering whether this way of writing my generator has enough advantages that one should do so.

P.S.: I guess part of my problem with the functional solution is that in order to minimize the amount of keywords in Python, some keywords like "for", "if" and "else" have been recycled for use in expressions so that their placement in the expression takes getting used to (the ordering in the generator expression z for x in a for y in x for z in y looks, at least to me, less natural than the ordering in the classic for loop: for x in a: for y in x: for z in y: yield z).

like image 830
Marc Avatar asked Oct 03 '10 12:10

Marc


People also ask

Is Itertools product faster?

That being said, the iterators from itertools are often significantly faster than regular iteration from a standard Python for loop.

Are Itertools useful?

The Python itertools library is provides extremely useful utility functions for dealing with iterables. By becoming fluent in the itertools functions, you can combine them in new ways and use them as building blocks for tackling complex problems in very few lines of code.

Is Itertools an inbuilt library?

Itertools is a Python module that is part of the Python 3 standard libraries. It lets us perform memory and computation efficient tasks on iterators. It is inspired by constructs from APL, Haskell, and SML.

Do I need to install Itertools in Python?

itertools is a built-in module in python and does not need to be installed separately.


1 Answers

I did some profiling and the regular generator function is way faster than either your second generator or my implementation.

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator1(a, b))'
10 loops, best of 3: 169 msec per loop

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator2(a, b))'
10 loops, best of 3: 489 msec per loop

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator3(a, b))'
10 loops, best of 3: 385 msec per loop

It also happens to be the most readable so I think i'd go with that. That being said, I'll still post my solution because I think it's a cleaner example of the sort of functional programming you can do with itertools (though clearly still not optimal, I feel like it should be able to smoke the regular generator function. I'll hack on it)

def generator3(parameters, strings):
    # replace strings with a generator of generators for the individual charachters
    strings = (it.islice((char for string_char in string_ for char in (',', string_char)), 1, None)
               for string_ in strings)

    # interpolate strings with the notices
    strings = (it.chain(('The strings are:',), string_) for string_ in strings)

    # nest them in tuples so they're ate the same level as the other generators
    separators = it.chain((('First line',),), it.cycle((('Some later line',),)))

    # replace the parameters with the appropriate tuples
    parameters = (('The parameter is:', p) if p else ('The parameter vanishes.',)
                  for p in parameters)

    # combine the separators, parameters and strings
    output = it.izip(separators, parameters, strings)

    # flatten it twice and return it
    output = it.chain.from_iterable(output)
    return it.chain.from_iterable(output)   

for reference, the test case is:

def make_test_case():
    a = [i % 100 for i in range(10000)]
    b = [('12345'*10)[:(i%50)+1] for i in range(10000)]
    return a, b
like image 57
aaronasterling Avatar answered Nov 15 '22 13:11

aaronasterling