I've encountered two versions of code that both can accomplish the same task with a little difference in the code itself:
with open("file") as f:
for line in f:
print line
and
with open("file") as f:
data = f.readlines()
for line in data:
print line
My question is, is the file object f
a list by default just like data
? If not, why does the first chunk of code work? Which version is the better practice?
A file object is an object that exposes "a file-oriented API (with methods such as read() or write()) to an underlying resource." A file name is just a text string containing the name of the file. It is no different than any other string object.
To create a file object in Python use the built-in functions, such as open() and os. popen() . IOError exception is raised when a file object is misused, or file operation fails for an I/O-related reason. For example, when you try to write to a file when a file is opened in read-only mode.
File
object is not a list
- it's an object that conforms to iterator interface (docs). I.e. it implements __iter__
method that returns an iterator object. That iterator object implements both __iter__
and next
methods allowing iteration over the collection.
It happens that the File
object is it's own iterator (docs) meaning file.__iter__()
returns self
.
Both for line in file
and lines = file.readlines()
are equivalent in that they yield the same result if used to get/iterator over all lines in the file. But, file.next()
buffers the contents from the file (it reads ahead) to speed up the process of reading, effectively moving the file descriptor to position exact to or farther than where the last line ended. This means that if you have used for line in file
, read some lines and the stopped the iteration (you haven't reach end of the file) and now called file.readlines()
, the first line returned might not be the full line following the last line iterated over the for
loop.
When you use for x in my_it
, the interpreter calls my_it.__iter__()
. Now, the next()
method is being called on the object returned by the previous call, and for each call it's return value is being assigned to x
. When next()
raises StopIteration
, the loop ends.
Note: A valid iterator implementation should ensure that once StopIteration
is raised, it should remain to be risen for all subsequent calls to next()
.
In both cases, you are getting a file line-by-line. The method is different.
With your first version:
with open("file") as f:
for line in f:
print line
While you are interating over the file line by line, the file contents are not resident fully in memory (unless it is a 1 line file).
The open built-in function returns a file object -- not a list. That object supports iteration; in this case returning individual strings that are each group of characters in the file terminated by either a carriage return or the end of file.
You can write a loop that is similar to what for line in f: print line
is doing under the hood:
with open('file') as f:
while True:
try:
line=f.next()
except StopIteration:
break
else:
print line
With the second version:
with open("file") as f:
data = f.readlines() # equivelent to data=list(f)
for line in data:
print line
You are using a method of a file object (file.readlines()) that reads the entire file contents into memory as a list of the individual lines. The code is then iterating over that list.
You can write a similar version of that as well that highlights the iterators under the hood:
with open('file') as f:
data=list(f)
it=iter(data)
while True:
try:
line=it.next()
except StopIteration:
break
else:
print line
In both of your examples, you are using a for loop to loop over items in a sequence. The items are the same in each case (individual lines of the file) but the underlying sequence is different. In the first version, the sequence is a file object; in the second version it is a list. Use the first version if you just want to deal with each line. Use the second if you want a list of lines.
Read Ned Batchelder's excellent overview on looping and iteration for more.
f
is a filehandle, not a list. It is iterable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With