Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is file implemented?

Tags:

python

I am curious how Files work in python. How is file implemented so that it is able to be looped through like this:

csv_file = open("filename.csv", "r")
for line in csv_file:
    # do something with line
like image 637
Aaron Avatar asked Dec 05 '22 21:12

Aaron


1 Answers

If you're using Python 2, the details are a little murky; alexmcf's answer covers the basics, and you can look up further details from there.

If you're using Python 3, everything is documented in great detail in the io module, and comes with a reasonably readable pure Python implementation in the stdlib, all built on top of nothing more than a very simple "raw file" interface (which FileIO implements on top of POSIX native file descriptors on Unix).

The IOBase ABC/mixin provides an __iter__ method based on the readline method:

IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings). See readline() below.

And if you look inside the 3.5 source, it's as simple as you'd expect it to be:

def __iter__(self):
    self._checkClosed()
    return self

def __next__(self):
    line = self.readline()
    if not line:
        raise StopIteration
    return line

Of course in CPython 3.1+, there's a C accelerator that's used in place of that Python code if possible, but it looks pretty similar:

static PyObject *
iobase_iter(PyObject *self)
{
    if (_PyIOBase_check_closed(self, Py_True) == NULL)
        return NULL;

    Py_INCREF(self);
    return self;
}

static PyObject *
iobase_iternext(PyObject *self)
{
    PyObject *line = PyObject_CallMethodObjArgs(self, _PyIO_str_readline, NULL);

    if (line == NULL)
        return NULL;

    if (PyObject_Size(line) == 0) {
        Py_DECREF(line);
        return NULL;
    }

    return line;
}

The file objects returned by open, and automatically created for things like sys.stdout, and most or all file objects created anywhere else in the stdlib (GzipFile, etc.), are instances of TextIOWrapper (for text files), or BufferedRandom, BufferedReader, or BufferedWriter (for binary files), which all inherit this behavior from IOBase. There's nothing stopping a different file class from overriding __iter__ (or registering with IOBase as an ABC instead of inheriting it), but I don't know of any that do.

like image 136
abarnert Avatar answered Dec 20 '22 17:12

abarnert