I've got the following snippet of code
def send(self, queue, fd):
for line in fd:
data = line.strip()
if data:
queue.write(json.loads(data))
Which of course works just fine, but I wonder sometimes if there is a "better" way to write that construct where you only act on non-blank lines.
The challenge is this should use the iterative nature of the for the 'fd' read and be able to handle files in the 100+ MB range.
UPDATE - In your haste to get points for this question you're ignoring an import part, which is memory usage. For instance the expression:
non_blank_lines = (line.strip() for line in fd if line.strip())
Is going to buffer the whole file into memory, not to mention performing a strip() action twice. Which will work for small files, but fails when you've got 100+MB of data (or once in a while a 100GB).
Part of the challenge is the following works, but is soup to read:
for line in ifilter(lambda l: l, imap(lambda l: l.strip(), fd)):
queue.write(json.loads(line))
Look for magic folks!
FINAL UPDATE: PEP-289 is very useful for my own better understanding of the difference between [] and () with iterators involved.
In Python 2 use itertools. ifilter if you want a generator and in Python 3, just pass the whole thing to list if you want a list.
Definition of blank line : a line on a document that marks where one should write something Sign your name on the blank line.
A blank line control inserts vertical white space in a section. Blank lines provide spacing between controls to improve readability or to visually separate different types of controls within a section.
There's nothing wrong with the code as written, it's readable and efficient.
An alternative approach would be to write it as a generator comprehension:
def send(self, queue, fd):
non_blank_lines = (line.strip() for line in fd if line.strip())
for line in non_blank_lines:
queue.write(json.loads(data))
This approach can be beneficial (terser) if you are applying a function that can take an iterator: e.g. python3 print
non_blank_lines = (line.strip() for line in fd if line.strip())
print(*non_blank_lines, file='foo')
To do away with the multiple calls to strip(), chain together generator comprehensions
stripped_lines = (line.strip() for line in fd)
non_blank_lines = (line for line in stripped_lines if line)
Note that generator expressions will not adversely affect memory as detailed in this pep.
For a more in depth look at this approach, and some performance bench marks, take a look at this set of notes.
Finally note that rstrip() will outperform strip() if you don't need the full behaviour of strip().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With