Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

line-by-line file processing, for-loop vs with

I am trying to understand the trade offs/differences between these to ways of opening files for line-by-line processing

with open('data.txt') as inf:
    for line in inf:
       #etc

vs

for line in open('data.txt'):
   # etc

I understand that using with ensures the file is closed when the "with-block" (suite?) is exited (or an exception is countered). So I have been using with ever since I learned about it here.

Re for-loop: From searching around the net and SO, it seems that whether the file is closed when the for-loop is exited is implementation dependent? And I couldn't find anything about how this construct would deal with exceptions. Does anyone know?

If I am mistaken about anything above, I'd appreciate corrections, otherwise is there a reason to ever use the for construct over the with? (Assuming you have a choice, i.e., aren't limited by Python version)

like image 599
Levon Avatar asked Jun 21 '12 01:06

Levon


People also ask

Which method is used to read file line by line in Python?

Python File readline() Method The readline() method returns one line from the file. You can also specified how many bytes from the line to return, by using the size parameter.

How do you iterate through a large file in Python?

use of with with is the nice and efficient pythonic way to read large files. advantages - 1) file object is automatically closed after exiting from with execution block. 2) exception handling inside the with block. 3) memory for loop iterates through the f file object line by line.

When reading a file using the file object what method is best for reading the entire file into a single string?

The readlines method returns the contents of the entire file as a list of strings, where each item in the list represents one line of the file. It is also possible to read the entire file into a single string with read .


2 Answers

The problem with this

for line in open('data.txt'):
   # etc

Is that you don't keep an explicit reference to the open file, so how do you close it? The lazy way is wait for the garbage collector to clean it up, but that may mean that the resources aren't freed in a timely manner.

So you can say

inf = open('data.txt')
for line in inf:
   # etc
inf.close()

Now what happens if there is an exception while you are inside the for loop? The file won't get closed explicitly.

Add a try/finally

inf = open('data.txt')
try:
    for line in inf:
       # etc
finally:
    inf.close()

This is a lot of code to do something pretty simple, so Python added with to enable this code to be written in a more readable way. Which gets us to here

with open('data.txt') as inf:
    for line in inf:
       #etc

So, that is the preferred way to open the file. If your Python is too old for the with statement, you should use the try/finally version for production code

like image 52
John La Rooy Avatar answered Nov 03 '22 10:11

John La Rooy


The with statement was only introduced in Python 2.5 - only if you have backward compatibility requirements for earlier versions should you use the latter.

Bit more clarity

The with statement was introduced (as you're aware) to encompass the try/except/finally system - which isn't terrific to understand, but okay. In Python (the Python in C), the implementation of it will close open files. The specification of the language itself, doesn't say... so IPython, JPython etc... may choose to keep files open, memory open, whatever, and not free resources until the next GC cycle (or at all, but the CPython GC is different from the .NET or Java ones...).

I think the only thing I've heard against it, is that it adds another indentation level.

So to summarise: won't work < 2.5, introduces the 'as' keyword and adds an indentation level.

Otherwise, you stay in control of handling exceptions as normal, and the finally block closes resources if something escapes.

Works for me!

like image 37
Jon Clements Avatar answered Nov 03 '22 08:11

Jon Clements