I have a text file like this:
11
2
3
4
11
111
Using Python 2.7, I want to turn it into a list of lists of lines, where line breaks divide items in the inner list and empty lines divide items in the outer list. Like so:
[["11","2","3","4"],["11"],["111"]]
And for this purpose, I wrote a generator function that would yield the inner lists one at a time once passed an open file object:
def readParag(fileObj):
currentParag = []
for line in fileObj:
stripped = line.rstrip()
if len(stripped) > 0: currentParag.append(stripped)
elif len(currentParag) > 0:
yield currentParag
currentParag = []
That works fine, and I can call it from within a list comprehension, producing the desired result. However, it subsequently occurred to me that I might be able to do the same thing more concisely using itertools.takewhile
(with a view to rewriting the generator function as a generator expression, but we'll leave that for now). This is what I tried:
from itertools import takewhile
def readParag(fileObj):
yield [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
In this case, the resulting generator yields only one result (the expected first one, i.e. ["11","2","3","4"]
). I had hoped that calling its next
method again would cause it to evaluate takewhile(lambda line: line != "\n", fileObj)
again on the remainder of the file, thus leading it to yield another list. But no: I got a StopIteration
instead. So I surmised that the take while
expression was being evaluated once only, at the time when the generator object was created, and not each time I called the resultant generator object's next
method.
This supposition made me wonder what would happen if I called the generator function again. The result was that it created a new generator object that also yielded a single result (the expected second one, i.e. ["11"]
) before throwing a StopIteration
back at me. So in fact, writing this as a generator function effectively gives the same result as if I'd written it as an ordinary function and return
ed the list instead of yield
ing it.
I guess I could solve this problem by creating my own class to use instead of a generator (as in John Millikin's answer to this question). But the point is that I was hoping to write something more concise than my original generator function (possibly even a generator expression). Can somebody tell me what I'm doing wrong, and how to get it right?
Yes, generator can be used only once.
Here is a summary of the advantages of generation expressions within python: Memory efficient method of generating sequence types in python. Adds further brevity and readability to written code. Generator expressions are generator functions shortened.
Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
Python Generators are the functions that return the traversal object and used to create iterators. It traverses the entire items at once. The generator can also be an expression in which syntax is similar to the list comprehension in Python.
What you're trying to do is a perfect job for groupby
:
from itertools import groupby
def read_parag(filename):
with open(filename) as f:
for k,g in groupby((line.strip() for line in f), bool):
if k:
yield list(g)
which will give:
>>> list(read_parag('myfile.txt')
[['11', '2', '3', '4'], ['11'], ['111']]
Or in one line:
[list(g) for k,g in groupby((line.strip() for line in open('myfile.txt')), bool) if k]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With