Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing only non-blank lines

Tags:

python

I've got the following snippet of code

def send(self, queue, fd):
    for line in fd:
        data = line.strip()
        if data:
            queue.write(json.loads(data))

Which of course works just fine, but I wonder sometimes if there is a "better" way to write that construct where you only act on non-blank lines.

The challenge is this should use the iterative nature of the for the 'fd' read and be able to handle files in the 100+ MB range.

UPDATE - In your haste to get points for this question you're ignoring an import part, which is memory usage. For instance the expression:

 non_blank_lines = (line.strip() for line in fd if line.strip())

Is going to buffer the whole file into memory, not to mention performing a strip() action twice. Which will work for small files, but fails when you've got 100+MB of data (or once in a while a 100GB).

Part of the challenge is the following works, but is soup to read:

for line in ifilter(lambda l: l, imap(lambda l: l.strip(), fd)):
    queue.write(json.loads(line))

Look for magic folks!

FINAL UPDATE: PEP-289 is very useful for my own better understanding of the difference between [] and () with iterators involved.

like image 543
koblas Avatar asked Dec 03 '12 17:12

koblas


People also ask

How do you ignore an empty line in Python?

In Python 2 use itertools. ifilter if you want a generator and in Python 3, just pass the whole thing to list if you want a list.

What are blank lines?

Definition of blank line : a line on a document that marks where one should write something Sign your name on the blank line.

What is a blank line space?

A blank line control inserts vertical white space in a section. Blank lines provide spacing between controls to improve readability or to visually separate different types of controls within a section.


1 Answers

There's nothing wrong with the code as written, it's readable and efficient.

An alternative approach would be to write it as a generator comprehension:

def send(self, queue, fd):
    non_blank_lines = (line.strip() for line in fd if line.strip())
    for line in non_blank_lines:
        queue.write(json.loads(data))

This approach can be beneficial (terser) if you are applying a function that can take an iterator: e.g. python3 print

non_blank_lines = (line.strip() for line in fd if line.strip())
print(*non_blank_lines, file='foo')

To do away with the multiple calls to strip(), chain together generator comprehensions

stripped_lines = (line.strip() for line in fd)
non_blank_lines = (line for line in stripped_lines if line)

Note that generator expressions will not adversely affect memory as detailed in this pep.

For a more in depth look at this approach, and some performance bench marks, take a look at this set of notes.

Finally note that rstrip() will outperform strip() if you don't need the full behaviour of strip().

like image 146
cmh Avatar answered Sep 19 '22 07:09

cmh