Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is an object file a list by default?

Tags:

python

I've encountered two versions of code that both can accomplish the same task with a little difference in the code itself:

with open("file") as f:
   for line in f:
     print line

and

with open("file") as f:
   data = f.readlines() 
   for line in data:
     print line 

My question is, is the file object f a list by default just like data? If not, why does the first chunk of code work? Which version is the better practice?

like image 482
xczzhh Avatar asked Oct 01 '13 16:10

xczzhh


People also ask

Is a file name and object?

A file object is an object that exposes "a file-oriented API (with methods such as read() or write()) to an underlying resource." A file name is just a text string containing the name of the file. It is no different than any other string object.

How do you create a file object in Python?

To create a file object in Python use the built-in functions, such as open() and os. popen() . IOError exception is raised when a file object is misused, or file operation fails for an I/O-related reason. For example, when you try to write to a file when a file is opened in read-only mode.


3 Answers

File object is not a list - it's an object that conforms to iterator interface (docs). I.e. it implements __iter__ method that returns an iterator object. That iterator object implements both __iter__ and next methods allowing iteration over the collection.

It happens that the File object is it's own iterator (docs) meaning file.__iter__() returns self.

Both for line in file and lines = file.readlines() are equivalent in that they yield the same result if used to get/iterator over all lines in the file. But, file.next() buffers the contents from the file (it reads ahead) to speed up the process of reading, effectively moving the file descriptor to position exact to or farther than where the last line ended. This means that if you have used for line in file, read some lines and the stopped the iteration (you haven't reach end of the file) and now called file.readlines(), the first line returned might not be the full line following the last line iterated over the for loop.

When you use for x in my_it, the interpreter calls my_it.__iter__(). Now, the next() method is being called on the object returned by the previous call, and for each call it's return value is being assigned to x. When next() raises StopIteration, the loop ends.

Note: A valid iterator implementation should ensure that once StopIteration is raised, it should remain to be risen for all subsequent calls to next().

like image 131
Maciej Gol Avatar answered Nov 15 '22 19:11

Maciej Gol


In both cases, you are getting a file line-by-line. The method is different.

With your first version:

with open("file") as f:
   for line in f:
     print line

While you are interating over the file line by line, the file contents are not resident fully in memory (unless it is a 1 line file).

The open built-in function returns a file object -- not a list. That object supports iteration; in this case returning individual strings that are each group of characters in the file terminated by either a carriage return or the end of file.

You can write a loop that is similar to what for line in f: print line is doing under the hood:

with open('file') as f:
    while True:
        try:
            line=f.next()
        except StopIteration:
            break
        else:
            print line 

With the second version:

with open("file") as f:
   data = f.readlines()    # equivelent to data=list(f)
   for line in data:
     print line

You are using a method of a file object (file.readlines()) that reads the entire file contents into memory as a list of the individual lines. The code is then iterating over that list.

You can write a similar version of that as well that highlights the iterators under the hood:

with open('file') as f:
    data=list(f)
    it=iter(data)
    while True:
        try:
            line=it.next()
        except StopIteration:
            break
        else:
            print line  

In both of your examples, you are using a for loop to loop over items in a sequence. The items are the same in each case (individual lines of the file) but the underlying sequence is different. In the first version, the sequence is a file object; in the second version it is a list. Use the first version if you just want to deal with each line. Use the second if you want a list of lines.

Read Ned Batchelder's excellent overview on looping and iteration for more.

like image 28
dawg Avatar answered Nov 15 '22 19:11

dawg


f is a filehandle, not a list. It is iterable.

like image 44
jgritty Avatar answered Nov 15 '22 19:11

jgritty