Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a Python function take a generator and return generators to subsets of its generated output?

Let's say I have a generator function like this:

import random
def big_gen():
  i = 0
  group = 'a'
  while group != 'd':
    i += 1
    yield (group, i)
    if random.random() < 0.20:
      group = chr(ord(group) + 1)

Example output might be: ('a', 1), ('a', 2), ('a', 3), ('a', 4), ('a', 5), ('a', 6), ('a', 7), ('a', 8), ('b', 9), ('c', 10), ('c', 11), ('c', 12), ('c', 13)

I would like to break this into three groups: Group A, Group B, and Group C. And I would like a generator for each group. Then I'd pass the generator and the group letter into a subfunction. An example of the subfunction:

def printer(group_letter, generator):
  print "These numbers are in group %s:" % group_letter
  for num in generator:
    print "\t%s" % num

The desired output would be:

These numbers are in group a:
1
2
3
4
5
6
7
8
These numbers are in group b:
9
These numbers are in group c:
10
11
12
13

How can I do this without changing big_gen() or printer(), and avoid storing the entire group in memory at once? (In real life, the groups are huge)

like image 829
mike Avatar asked Aug 01 '09 00:08

mike


1 Answers

Sure, this does what you want:

import itertools
import operator

def main():
  for let, gen in itertools.groupby(big_gen(), key=operator.itemgetter(0)):
    secgen = itertools.imap(operator.itemgetter(1), gen)
    printer(let, secgen)

groupby does the bulk of the work here -- the key= just tells it what field to group by.

The resulting generator needs to be wrapped in an imap just because you've specified your printer signature to take an iterator over number, while, by nature, groupby returns iterators over the same items it gets as its input -- here, 2-items tuples with a letter followed by a number -- but this is not really all that germane to your question's title.

The answer to that title is that, yep, a Python function can perfectly well do the job you want -- itertools.groupby in fact does exactly that. I recommend studying the itertools module carefully, it's a very useful tool (and delivers splendid performance as well).

like image 94
Alex Martelli Avatar answered Nov 12 '22 23:11

Alex Martelli