Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

itertools.takewhile within a generator function - why is it evaluated once only?

I have a text file like this:

11
2
3
4

11

111

Using Python 2.7, I want to turn it into a list of lists of lines, where line breaks divide items in the inner list and empty lines divide items in the outer list. Like so:

[["11","2","3","4"],["11"],["111"]]

And for this purpose, I wrote a generator function that would yield the inner lists one at a time once passed an open file object:

def readParag(fileObj):
    currentParag = []
    for line in fileObj:
        stripped = line.rstrip()
    if len(stripped) > 0: currentParag.append(stripped)
    elif len(currentParag) > 0:
        yield currentParag
        currentParag = []

That works fine, and I can call it from within a list comprehension, producing the desired result. However, it subsequently occurred to me that I might be able to do the same thing more concisely using itertools.takewhile (with a view to rewriting the generator function as a generator expression, but we'll leave that for now). This is what I tried:

from itertools import takewhile    
def readParag(fileObj):
    yield [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]

In this case, the resulting generator yields only one result (the expected first one, i.e. ["11","2","3","4"]). I had hoped that calling its next method again would cause it to evaluate takewhile(lambda line: line != "\n", fileObj) again on the remainder of the file, thus leading it to yield another list. But no: I got a StopIteration instead. So I surmised that the take while expression was being evaluated once only, at the time when the generator object was created, and not each time I called the resultant generator object's next method.

This supposition made me wonder what would happen if I called the generator function again. The result was that it created a new generator object that also yielded a single result (the expected second one, i.e. ["11"]) before throwing a StopIteration back at me. So in fact, writing this as a generator function effectively gives the same result as if I'd written it as an ordinary function and returned the list instead of yielding it.

I guess I could solve this problem by creating my own class to use instead of a generator (as in John Millikin's answer to this question). But the point is that I was hoping to write something more concise than my original generator function (possibly even a generator expression). Can somebody tell me what I'm doing wrong, and how to get it right?

like image 835
Westcroft_to_Apse Avatar asked Aug 07 '12 19:08

Westcroft_to_Apse


People also ask

Can a generator be called multiple times in python?

Yes, generator can be used only once.

What is the advantage of generator function in Python?

Here is a summary of the advantages of generation expressions within python: Memory efficient method of generating sequence types in python. Adds further brevity and readability to written code. Generator expressions are generator functions shortened.

What does a generator function return Python?

Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).

What do you mean by generator in Python and how it benefits the higher versions of Python Software?

Python Generators are the functions that return the traversal object and used to create iterators. It traverses the entire items at once. The generator can also be an expression in which syntax is similar to the list comprehension in Python.


1 Answers

What you're trying to do is a perfect job for groupby:

from itertools import groupby

def read_parag(filename):
    with open(filename) as f:
        for k,g in groupby((line.strip() for line in f), bool):
            if k:
                yield list(g)

which will give:

>>> list(read_parag('myfile.txt')
[['11', '2', '3', '4'], ['11'], ['111']]

Or in one line:

[list(g) for k,g in groupby((line.strip() for line in open('myfile.txt')), bool) if k]
like image 164
Rik Poggi Avatar answered Nov 15 '22 14:11

Rik Poggi