I am trying to understand the trade offs/differences between these to ways of opening files for line-by-line processing
with open('data.txt') as inf:
for line in inf:
#etc
vs
for line in open('data.txt'):
# etc
I understand that using with
ensures the file is closed when the
"with-block" (suite?) is exited (or an exception is countered). So I have been using with
ever since I learned about it here.
Re for
-loop: From searching around the net and SO, it seems that whether the file
is closed when the for
-loop is exited is implementation dependent? And
I couldn't find anything about how this construct would deal with
exceptions. Does anyone know?
If I am mistaken about anything above, I'd appreciate corrections,
otherwise is there a reason to ever use the for
construct over the
with
? (Assuming you have a choice, i.e., aren't limited by Python version)
Python File readline() Method The readline() method returns one line from the file. You can also specified how many bytes from the line to return, by using the size parameter.
use of with with is the nice and efficient pythonic way to read large files. advantages - 1) file object is automatically closed after exiting from with execution block. 2) exception handling inside the with block. 3) memory for loop iterates through the f file object line by line.
The readlines method returns the contents of the entire file as a list of strings, where each item in the list represents one line of the file. It is also possible to read the entire file into a single string with read .
The problem with this
for line in open('data.txt'):
# etc
Is that you don't keep an explicit reference to the open file, so how do you close it? The lazy way is wait for the garbage collector to clean it up, but that may mean that the resources aren't freed in a timely manner.
So you can say
inf = open('data.txt')
for line in inf:
# etc
inf.close()
Now what happens if there is an exception while you are inside the for loop? The file won't get closed explicitly.
Add a try/finally
inf = open('data.txt')
try:
for line in inf:
# etc
finally:
inf.close()
This is a lot of code to do something pretty simple, so Python added with
to enable this code to be written in a more readable way. Which gets us to here
with open('data.txt') as inf:
for line in inf:
#etc
So, that is the preferred way to open the file. If your Python is too old for the with statement, you should use the try/finally
version for production code
The with statement was only introduced in Python 2.5 - only if you have backward compatibility requirements for earlier versions should you use the latter.
Bit more clarity
The with statement was introduced (as you're aware) to encompass the try/except/finally system - which isn't terrific to understand, but okay. In Python (the Python in C), the implementation of it will close open files. The specification of the language itself, doesn't say... so IPython, JPython etc... may choose to keep files open, memory open, whatever, and not free resources until the next GC cycle (or at all, but the CPython GC is different from the .NET or Java ones...).
I think the only thing I've heard against it, is that it adds another indentation level.
So to summarise: won't work < 2.5, introduces the 'as' keyword and adds an indentation level.
Otherwise, you stay in control of handling exceptions as normal, and the finally block closes resources if something escapes.
Works for me!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With